Watching Jamie build the model
library(tidyverse)
library(GGally)
library(modelr)
library(janitor)
avocados <- clean_names(read_csv("data/avocado.csv"))
head(avocados)
Ok, we have 14 variables. Can already see that some of them are somewhat useless (x1 for example). Not sure whether the total_bags variable is the sum of small_bags, large_bags and x_large_bags so I’ll check that first.
# check to see if total_bags variable is just the sum of the other three
avocados %>%
mutate(total_sum = small_bags + large_bags + x_large_bags) %>%
select(total_bags, total_sum)
Yep, the total_bags column is just a sum of the other three. So this is a another variable I can get rid of. I can also check the same for volume:
# check to see if total_volume variable is just the sum of the other three
avocados %>%
mutate(total_sum = x4046 + x4225 + x4770) %>%
select(total_volume, total_sum)
Nope, these aren’t the same, so we can keep all these in.
Now let’s check how many different levels of each categorical variable we have.
avocados %>%
distinct(region) %>%
summarise(number_of_regions = n())
avocados %>%
distinct(date) %>%
summarise(
number_of_dates = n(),
min_date = min(date),
max_date = max(date)
)
The region variable will lead to many categorical levels, but we can try leaving it in. We should also examine date and perhaps pull out from it whatever features we can. Including every single date would be too much, so we can extract the different parts of the date that might be useful. For example, we could try and split it into different quarters, or years.
So, let’s do this now. Remove the variables we don’t need, change our categorical variables to factors, and extract parts of the date in case they are useful (and get rid of date).
library(lubridate)
trimmed_avocados <- avocados %>%
mutate(
quarter = as_factor(quarter(date)),
year = as_factor(year),
type = as_factor(type),
region = as_factor(region)
) %>%
select(-c(x1, date,total_bags))
Now we’ve done our cleaning, we can check for aliased variables (i.e. combinations of variables in which one or more of the variables can be calculated exactly from other variables):
alias(average_price ~ ., data = trimmed_avocados )
Nice, we don’t find any aliases. So we can keep going.
We need to decide on which variable we want to put in our model first. To do this, we should visualise it. Because we have so much data, ggpairs() might take a while to run, so we can split it up a bit.
# let's start by plotting the volume variables
trimmed_avocados %>%
select(average_price, total_volume, x4046, x4225, x4770) %>%
ggpairs() +
theme_grey(base_size = 8) # font size of labels
plot: [1,1] [==>--------------------------------------------------------------] 4% est: 0s
plot: [1,2] [====>------------------------------------------------------------] 8% est: 2s
plot: [1,3] [=======>---------------------------------------------------------] 12% est: 2s
plot: [1,4] [=========>-------------------------------------------------------] 16% est: 1s
plot: [1,5] [============>----------------------------------------------------] 20% est: 1s
plot: [2,1] [===============>-------------------------------------------------] 24% est: 1s
plot: [2,2] [=================>-----------------------------------------------] 28% est: 2s
plot: [2,3] [====================>--------------------------------------------] 32% est: 1s
plot: [2,4] [======================>------------------------------------------] 36% est: 1s
plot: [2,5] [=========================>---------------------------------------] 40% est: 1s
plot: [3,1] [============================>------------------------------------] 44% est: 1s
plot: [3,2] [==============================>----------------------------------] 48% est: 1s
plot: [3,3] [=================================>-------------------------------] 52% est: 1s
plot: [3,4] [===================================>-----------------------------] 56% est: 1s
plot: [3,5] [======================================>--------------------------] 60% est: 1s
plot: [4,1] [=========================================>-----------------------] 64% est: 1s
plot: [4,2] [===========================================>---------------------] 68% est: 1s
plot: [4,3] [==============================================>------------------] 72% est: 1s
plot: [4,4] [================================================>----------------] 76% est: 1s
plot: [4,5] [===================================================>-------------] 80% est: 0s
plot: [5,1] [======================================================>----------] 84% est: 0s
plot: [5,2] [========================================================>--------] 88% est: 0s
plot: [5,3] [===========================================================>-----] 92% est: 0s
plot: [5,4] [=============================================================>---] 96% est: 0s
plot: [5,5] [=================================================================]100% est: 0s
Hmm, these look highly correlated with one another in some instances. This is a sign that we won’t have to include all of these in our model, so we could think about removing x4225 and x4770 from our dataset to give ourselves fewer variables.
trimmed_avocados <- trimmed_avocados %>%
select(-x4225, -x4770)
In terms of variables that correlate well with average_price… well none of them do, that well. But that’s life. Our x046 variable is probably our first candidate.
Next we can look at our volume variables.
trimmed_avocados %>%
select(average_price, small_bags, large_bags, x_large_bags) %>%
ggpairs() +
theme_grey(base_size = 8) # font size of labels
plot: [1,1] [===>-------------------------------------------------------------] 6% est: 0s
plot: [1,2] [=======>---------------------------------------------------------] 12% est: 1s
plot: [1,3] [===========>-----------------------------------------------------] 19% est: 1s
plot: [1,4] [===============>-------------------------------------------------] 25% est: 1s
plot: [2,1] [===================>---------------------------------------------] 31% est: 1s
plot: [2,2] [=======================>-----------------------------------------] 38% est: 1s
plot: [2,3] [===========================>-------------------------------------] 44% est: 1s
plot: [2,4] [===============================>---------------------------------] 50% est: 1s
plot: [3,1] [====================================>----------------------------] 56% est: 1s
plot: [3,2] [========================================>------------------------] 62% est: 0s
plot: [3,3] [============================================>--------------------] 69% est: 0s
plot: [3,4] [================================================>----------------] 75% est: 0s
plot: [4,1] [====================================================>------------] 81% est: 0s
plot: [4,2] [========================================================>--------] 88% est: 0s
plot: [4,3] [============================================================>----] 94% est: 0s
plot: [4,4] [=================================================================]100% est: 0s
Hmm, again… not that promising. Some of the variables are highly correlated with one another, but not much seems highly correlated with average_price.
We can look at some of our categorical variables next:
trimmed_avocados %>%
select(average_price, type, year, quarter) %>%
ggpairs() +
theme_grey(base_size = 8) # font size of labels
plot: [1,1] [===>-------------------------------------------------------------] 6% est: 0s
plot: [1,2] [=======>---------------------------------------------------------] 12% est: 1s
plot: [1,3] [===========>-----------------------------------------------------] 19% est: 1s
plot: [1,4] [===============>-------------------------------------------------] 25% est: 1s
plot: [2,1] [===================>---------------------------------------------] 31% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [2,2] [=======================>-----------------------------------------] 38% est: 2s
plot: [2,3] [===========================>-------------------------------------] 44% est: 1s
plot: [2,4] [===============================>---------------------------------] 50% est: 1s
plot: [3,1] [====================================>----------------------------] 56% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [3,2] [========================================>------------------------] 62% est: 1s
plot: [3,3] [============================================>--------------------] 69% est: 1s
plot: [3,4] [================================================>----------------] 75% est: 1s
plot: [4,1] [====================================================>------------] 81% est: 0s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [4,2] [========================================================>--------] 88% est: 0s
plot: [4,3] [============================================================>----] 94% est: 0s
plot: [4,4] [=================================================================]100% est: 0s
This seems better! Our type variable seems to show variation in the boxplots. This might suggest that conventional avocados and organic ones have different prices (which again, makes sense).
Finally, we can make a boxplot of our region variable. Because this has so many levels, it makes sense to plot it by itself so we can see it.
trimmed_avocados %>%
ggplot(aes(x = region, y = average_price)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Ok, seems there is some variation in the boxplots between different regions, so that seems like it could be promising.
Let’s start by test competing models. We decided that x4046, type, and region seemed reasonable:
library(ggfortify)
# build the model
model1a <- lm(average_price ~ x4046, data = trimmed_avocados)
# check the diagnostics
autoplot(model1a)
# check the summary output
summary(model1a)
Call:
lm(formula = average_price ~ x4046, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-0.98539 -0.29842 -0.03531 0.25459 1.82475
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.425e+00 2.993e-03 476.29 <2e-16 ***
x4046 -6.631e-08 2.305e-09 -28.77 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3939 on 18247 degrees of freedom
Multiple R-squared: 0.0434, Adjusted R-squared: 0.04334
F-statistic: 827.8 on 1 and 18247 DF, p-value: < 2.2e-16
# build the model
model1b <- lm(average_price ~ type, data = trimmed_avocados)
# check the diagnostics
autoplot(model1b)
# check the summary output
summary(model1b)
Call:
lm(formula = average_price ~ type, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.21400 -0.20400 -0.02804 0.18600 1.59600
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.158040 0.003321 348.7 <2e-16 ***
typeorganic 0.495959 0.004697 105.6 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3173 on 18247 degrees of freedom
Multiple R-squared: 0.3793, Adjusted R-squared: 0.3792
F-statistic: 1.115e+04 on 1 and 18247 DF, p-value: < 2.2e-16
# build the model
model1c <- lm(average_price ~ region, data = trimmed_avocados)
# check the diagnostics
autoplot(model1c)
# check the summary output
summary(model1c)
Call:
lm(formula = average_price ~ region, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-0.97095 -0.28423 -0.03432 0.25207 1.76115
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.561036 0.020006 78.029 < 2e-16 ***
regionAtlanta -0.223077 0.028293 -7.885 3.33e-15 ***
regionBaltimoreWashington -0.026805 0.028293 -0.947 0.34344
regionBoise -0.212899 0.028293 -7.525 5.52e-14 ***
regionBoston -0.030148 0.028293 -1.066 0.28663
regionBuffaloRochester -0.044201 0.028293 -1.562 0.11824
regionCalifornia -0.165710 0.028293 -5.857 4.79e-09 ***
regionCharlotte 0.045000 0.028293 1.591 0.11173
regionChicago -0.004260 0.028293 -0.151 0.88031
regionCincinnatiDayton -0.351834 0.028293 -12.436 < 2e-16 ***
regionColumbus -0.308254 0.028293 -10.895 < 2e-16 ***
regionDallasFtWorth -0.475444 0.028293 -16.805 < 2e-16 ***
regionDenver -0.342456 0.028293 -12.104 < 2e-16 ***
regionDetroit -0.284941 0.028293 -10.071 < 2e-16 ***
regionGrandRapids -0.056036 0.028293 -1.981 0.04765 *
regionGreatLakes -0.222485 0.028293 -7.864 3.94e-15 ***
regionHarrisburgScranton -0.047751 0.028293 -1.688 0.09147 .
regionHartfordSpringfield 0.257604 0.028293 9.105 < 2e-16 ***
regionHouston -0.513107 0.028293 -18.136 < 2e-16 ***
regionIndianapolis -0.247041 0.028293 -8.732 < 2e-16 ***
regionJacksonville -0.050089 0.028293 -1.770 0.07668 .
regionLasVegas -0.180118 0.028293 -6.366 1.98e-10 ***
regionLosAngeles -0.345030 0.028293 -12.195 < 2e-16 ***
regionLouisville -0.274349 0.028293 -9.697 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.028293 -4.685 2.82e-06 ***
regionMidsouth -0.156272 0.028293 -5.523 3.37e-08 ***
regionNashville -0.348935 0.028293 -12.333 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.028293 -9.057 < 2e-16 ***
regionNewYork 0.166538 0.028293 5.886 4.02e-09 ***
regionNortheast 0.040888 0.028293 1.445 0.14843
regionNorthernNewEngland -0.083639 0.028293 -2.956 0.00312 **
regionOrlando -0.054822 0.028293 -1.938 0.05268 .
regionPhiladelphia 0.071095 0.028293 2.513 0.01199 *
regionPhoenixTucson -0.336598 0.028293 -11.897 < 2e-16 ***
regionPittsburgh -0.196716 0.028293 -6.953 3.70e-12 ***
regionPlains -0.124527 0.028293 -4.401 1.08e-05 ***
regionPortland -0.243314 0.028293 -8.600 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.028293 -0.209 0.83434
regionRichmondNorfolk -0.269704 0.028293 -9.533 < 2e-16 ***
regionRoanoke -0.313107 0.028293 -11.067 < 2e-16 ***
regionSacramento 0.060533 0.028293 2.140 0.03241 *
regionSanDiego -0.162870 0.028293 -5.757 8.72e-09 ***
regionSanFrancisco 0.243166 0.028293 8.595 < 2e-16 ***
regionSeattle -0.118462 0.028293 -4.187 2.84e-05 ***
regionSouthCarolina -0.157751 0.028293 -5.576 2.50e-08 ***
regionSouthCentral -0.459793 0.028293 -16.251 < 2e-16 ***
regionSoutheast -0.163018 0.028293 -5.762 8.45e-09 ***
regionSpokane -0.115444 0.028293 -4.080 4.52e-05 ***
regionStLouis -0.130414 0.028293 -4.609 4.06e-06 ***
regionSyracuse -0.040710 0.028293 -1.439 0.15020
regionTampa -0.152189 0.028293 -5.379 7.58e-08 ***
regionTotalUS -0.242012 0.028293 -8.554 < 2e-16 ***
regionWest -0.288817 0.028293 -10.208 < 2e-16 ***
regionWestTexNewMexico -0.299334 0.028356 -10.556 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3678 on 18195 degrees of freedom
Multiple R-squared: 0.1681, Adjusted R-squared: 0.1657
F-statistic: 69.38 on 53 and 18195 DF, p-value: < 2.2e-16
model1b with type is best, so we’ll keep that and re-run ggpairs() with the residuals (again omitting region because it’s too big).
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model1b) %>%
select(-c("average_price", "type", "region"))
ggpairs(avocados_remaining_resid) +
theme_grey(base_size = 8) # this bit just changes the axis label font size so we can see
plot: [1,1] [>----------------------------------------------------------------] 2% est: 0s
plot: [1,2] [=>---------------------------------------------------------------] 3% est: 4s
plot: [1,3] [==>--------------------------------------------------------------] 5% est: 5s
plot: [1,4] [===>-------------------------------------------------------------] 6% est: 5s
plot: [1,5] [====>------------------------------------------------------------] 8% est: 5s
plot: [1,6] [=====>-----------------------------------------------------------] 9% est: 4s
plot: [1,7] [======>----------------------------------------------------------] 11% est: 5s
plot: [1,8] [=======>---------------------------------------------------------] 12% est: 5s
plot: [2,1] [========>--------------------------------------------------------] 14% est: 4s
plot: [2,2] [=========>-------------------------------------------------------] 16% est: 4s
plot: [2,3] [==========>------------------------------------------------------] 17% est: 4s
plot: [2,4] [===========>-----------------------------------------------------] 19% est: 4s
plot: [2,5] [============>----------------------------------------------------] 20% est: 4s
plot: [2,6] [=============>---------------------------------------------------] 22% est: 4s
plot: [2,7] [==============>--------------------------------------------------] 23% est: 4s
plot: [2,8] [===============>-------------------------------------------------] 25% est: 4s
plot: [3,1] [================>------------------------------------------------] 27% est: 4s
plot: [3,2] [=================>-----------------------------------------------] 28% est: 4s
plot: [3,3] [==================>----------------------------------------------] 30% est: 4s
plot: [3,4] [===================>---------------------------------------------] 31% est: 3s
plot: [3,5] [====================>--------------------------------------------] 33% est: 3s
plot: [3,6] [=====================>-------------------------------------------] 34% est: 3s
plot: [3,7] [======================>------------------------------------------] 36% est: 3s
plot: [3,8] [=======================>-----------------------------------------] 38% est: 3s
plot: [4,1] [========================>----------------------------------------] 39% est: 3s
plot: [4,2] [=========================>---------------------------------------] 41% est: 3s
plot: [4,3] [==========================>--------------------------------------] 42% est: 3s
plot: [4,4] [===========================>-------------------------------------] 44% est: 3s
plot: [4,5] [============================>------------------------------------] 45% est: 3s
plot: [4,6] [=============================>-----------------------------------] 47% est: 3s
plot: [4,7] [==============================>----------------------------------] 48% est: 3s
plot: [4,8] [===============================>---------------------------------] 50% est: 2s
plot: [5,1] [=================================>-------------------------------] 52% est: 2s
plot: [5,2] [==================================>------------------------------] 53% est: 2s
plot: [5,3] [===================================>-----------------------------] 55% est: 2s
plot: [5,4] [====================================>----------------------------] 56% est: 2s
plot: [5,5] [=====================================>---------------------------] 58% est: 2s
plot: [5,6] [======================================>--------------------------] 59% est: 2s
plot: [5,7] [=======================================>-------------------------] 61% est: 2s
plot: [5,8] [========================================>------------------------] 62% est: 2s
plot: [6,1] [=========================================>-----------------------] 64% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,2] [==========================================>----------------------] 66% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,3] [===========================================>---------------------] 67% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,4] [============================================>--------------------] 69% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,5] [=============================================>-------------------] 70% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,6] [==============================================>------------------] 72% est: 2s
plot: [6,7] [===============================================>-----------------] 73% est: 2s
plot: [6,8] [================================================>----------------] 75% est: 2s
plot: [7,1] [=================================================>---------------] 77% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7,2] [==================================================>--------------] 78% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7,3] [===================================================>-------------] 80% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7,4] [====================================================>------------] 81% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7,5] [=====================================================>-----------] 83% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7,6] [======================================================>----------] 84% est: 1s
plot: [7,7] [=======================================================>---------] 86% est: 1s
plot: [7,8] [========================================================>--------] 88% est: 1s
plot: [8,1] [=========================================================>-------] 89% est: 1s
plot: [8,2] [==========================================================>------] 91% est: 1s
plot: [8,3] [===========================================================>-----] 92% est: 1s
plot: [8,4] [============================================================>----] 94% est: 0s
plot: [8,5] [=============================================================>---] 95% est: 0s
plot: [8,6] [==============================================================>--] 97% est: 0s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [8,7] [===============================================================>-] 98% est: 0s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [8,8] [=================================================================]100% est: 0s
Again, this isn’t showing any really high correlations between the residuals and any of our numeric variables. Looks like x4046, year, quarter could show something potentially (given the rubbish variables we have).
trimmed_avocados %>%
add_residuals(model1b) %>%
ggplot(aes(x = region, y = resid)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Looks like region are our next contenders to try. Let’s do these now.
model2a <- lm(average_price ~ type + x4046, data = trimmed_avocados)
autoplot(model2a)
summary(model2a)
Call:
lm(formula = average_price ~ type + x4046, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.21416 -0.20029 -0.02736 0.18591 1.59589
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.171e+00 3.485e-03 336.13 <2e-16 ***
typeorganic 4.827e-01 4.802e-03 100.52 <2e-16 ***
x4046 -2.323e-08 1.898e-09 -12.24 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.316 on 18246 degrees of freedom
Multiple R-squared: 0.3843, Adjusted R-squared: 0.3843
F-statistic: 5695 on 2 and 18246 DF, p-value: < 2.2e-16
model2b <- lm(average_price ~ type + year, data = trimmed_avocados)
autoplot(model2b)
summary(model2b)
Call:
lm(formula = average_price ~ type + year, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.32320 -0.18722 -0.01722 0.18278 1.66337
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.127645 0.004704 239.735 < 2e-16 ***
typeorganic 0.495980 0.004563 108.685 < 2e-16 ***
year2016 -0.036995 0.005817 -6.360 2.07e-10 ***
year2017 0.139580 0.005790 24.107 < 2e-16 ***
year2018 -0.028104 0.009499 -2.959 0.00309 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3082 on 18244 degrees of freedom
Multiple R-squared: 0.4142, Adjusted R-squared: 0.4141
F-statistic: 3225 on 4 and 18244 DF, p-value: < 2.2e-16
model2c <- lm(average_price ~ type + quarter, data = trimmed_avocados)
autoplot(model2c)
summary(model2c)
Call:
lm(formula = average_price ~ type + quarter, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.11458 -0.20089 -0.02458 0.18542 1.54687
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.058626 0.004718 224.38 <2e-16 ***
typeorganic 0.495958 0.004543 109.16 <2e-16 ***
quarter2 0.068546 0.006282 10.91 <2e-16 ***
quarter3 0.206308 0.006281 32.84 <2e-16 ***
quarter4 0.152040 0.006237 24.38 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3069 on 18244 degrees of freedom
Multiple R-squared: 0.4193, Adjusted R-squared: 0.4192
F-statistic: 3294 on 4 and 18244 DF, p-value: < 2.2e-16
model2d <- lm(average_price ~ type + region, data = trimmed_avocados)
autoplot(model2d)
summary(model2d)
Call:
lm(formula = average_price ~ type + region, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.09858 -0.16716 -0.01814 0.14692 1.51320
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.313079 0.014894 88.159 < 2e-16 ***
typeorganic 0.495912 0.004017 123.452 < 2e-16 ***
regionAtlanta -0.223077 0.020871 -10.688 < 2e-16 ***
regionBaltimoreWashington -0.026805 0.020871 -1.284 0.19906
regionBoise -0.212899 0.020871 -10.201 < 2e-16 ***
regionBoston -0.030148 0.020871 -1.444 0.14863
regionBuffaloRochester -0.044201 0.020871 -2.118 0.03421 *
regionCalifornia -0.165710 0.020871 -7.940 2.15e-15 ***
regionCharlotte 0.045000 0.020871 2.156 0.03109 *
regionChicago -0.004260 0.020871 -0.204 0.83826
regionCincinnatiDayton -0.351834 0.020871 -16.857 < 2e-16 ***
regionColumbus -0.308254 0.020871 -14.769 < 2e-16 ***
regionDallasFtWorth -0.475444 0.020871 -22.780 < 2e-16 ***
regionDenver -0.342456 0.020871 -16.408 < 2e-16 ***
regionDetroit -0.284941 0.020871 -13.652 < 2e-16 ***
regionGrandRapids -0.056036 0.020871 -2.685 0.00726 **
regionGreatLakes -0.222485 0.020871 -10.660 < 2e-16 ***
regionHarrisburgScranton -0.047751 0.020871 -2.288 0.02216 *
regionHartfordSpringfield 0.257604 0.020871 12.342 < 2e-16 ***
regionHouston -0.513107 0.020871 -24.584 < 2e-16 ***
regionIndianapolis -0.247041 0.020871 -11.836 < 2e-16 ***
regionJacksonville -0.050089 0.020871 -2.400 0.01641 *
regionLasVegas -0.180118 0.020871 -8.630 < 2e-16 ***
regionLosAngeles -0.345030 0.020871 -16.531 < 2e-16 ***
regionLouisville -0.274349 0.020871 -13.145 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.020871 -6.351 2.20e-10 ***
regionMidsouth -0.156272 0.020871 -7.487 7.35e-14 ***
regionNashville -0.348935 0.020871 -16.718 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.020871 -12.277 < 2e-16 ***
regionNewYork 0.166538 0.020871 7.979 1.56e-15 ***
regionNortheast 0.040888 0.020871 1.959 0.05013 .
regionNorthernNewEngland -0.083639 0.020871 -4.007 6.16e-05 ***
regionOrlando -0.054822 0.020871 -2.627 0.00863 **
regionPhiladelphia 0.071095 0.020871 3.406 0.00066 ***
regionPhoenixTucson -0.336598 0.020871 -16.127 < 2e-16 ***
regionPittsburgh -0.196716 0.020871 -9.425 < 2e-16 ***
regionPlains -0.124527 0.020871 -5.966 2.47e-09 ***
regionPortland -0.243314 0.020871 -11.658 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.020871 -0.284 0.77679
regionRichmondNorfolk -0.269704 0.020871 -12.922 < 2e-16 ***
regionRoanoke -0.313107 0.020871 -15.002 < 2e-16 ***
regionSacramento 0.060533 0.020871 2.900 0.00373 **
regionSanDiego -0.162870 0.020871 -7.803 6.35e-15 ***
regionSanFrancisco 0.243166 0.020871 11.651 < 2e-16 ***
regionSeattle -0.118462 0.020871 -5.676 1.40e-08 ***
regionSouthCarolina -0.157751 0.020871 -7.558 4.28e-14 ***
regionSouthCentral -0.459793 0.020871 -22.030 < 2e-16 ***
regionSoutheast -0.163018 0.020871 -7.811 6.00e-15 ***
regionSpokane -0.115444 0.020871 -5.531 3.22e-08 ***
regionStLouis -0.130414 0.020871 -6.248 4.24e-10 ***
regionSyracuse -0.040710 0.020871 -1.951 0.05113 .
regionTampa -0.152189 0.020871 -7.292 3.18e-13 ***
regionTotalUS -0.242012 0.020871 -11.595 < 2e-16 ***
regionWest -0.288817 0.020871 -13.838 < 2e-16 ***
regionWestTexNewMexico -0.297114 0.020918 -14.204 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2713 on 18194 degrees of freedom
Multiple R-squared: 0.5473, Adjusted R-squared: 0.546
F-statistic: 407.4 on 54 and 18194 DF, p-value: < 2.2e-16
So model2d with type and region comes out as better here. We have some region coefficients that are not significant at 0.05 level, so let’s run an anova() to test whether to include region
# model1b is the model with average_price ~ type
# model2d is the model with average_price ~ type + region
# we want to compare the two
anova(model1b, model2d)
Analysis of Variance Table
Model 1: average_price ~ type
Model 2: average_price ~ type + region
Res.Df RSS Df Sum of Sq F Pr(>F)
1 18247 1836.7
2 18194 1339.4 53 497.26 127.44 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
It seems region is significant overall, so we’ll keep it in!
Model2d is our model with average_price ~ type + region, and it explains 0.5473 of the variance in average price. This isn’t really very high, so we can think about adding a third predictor now. Again, we want to remove these variables from our data, and check the residuals.
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model2d) %>%
select(-c("average_price", "type", "region"))
ggpairs(avocados_remaining_resid) +
theme_grey(base_size = 8) # font size of labels
plot: [1,1] [>----------------------------------------------------------------] 2% est: 0s
plot: [1,2] [=>---------------------------------------------------------------] 3% est: 4s
plot: [1,3] [==>--------------------------------------------------------------] 5% est: 5s
plot: [1,4] [===>-------------------------------------------------------------] 6% est: 4s
plot: [1,5] [====>------------------------------------------------------------] 8% est: 4s
plot: [1,6] [=====>-----------------------------------------------------------] 9% est: 4s
plot: [1,7] [======>----------------------------------------------------------] 11% est: 5s
plot: [1,8] [=======>---------------------------------------------------------] 12% est: 5s
plot: [2,1] [========>--------------------------------------------------------] 14% est: 5s
plot: [2,2] [=========>-------------------------------------------------------] 16% est: 4s
plot: [2,3] [==========>------------------------------------------------------] 17% est: 4s
plot: [2,4] [===========>-----------------------------------------------------] 19% est: 4s
plot: [2,5] [============>----------------------------------------------------] 20% est: 4s
plot: [2,6] [=============>---------------------------------------------------] 22% est: 4s
plot: [2,7] [==============>--------------------------------------------------] 23% est: 4s
plot: [2,8] [===============>-------------------------------------------------] 25% est: 4s
plot: [3,1] [================>------------------------------------------------] 27% est: 4s
plot: [3,2] [=================>-----------------------------------------------] 28% est: 4s
plot: [3,3] [==================>----------------------------------------------] 30% est: 4s
plot: [3,4] [===================>---------------------------------------------] 31% est: 4s
plot: [3,5] [====================>--------------------------------------------] 33% est: 4s
plot: [3,6] [=====================>-------------------------------------------] 34% est: 3s
plot: [3,7] [======================>------------------------------------------] 36% est: 3s
plot: [3,8] [=======================>-----------------------------------------] 38% est: 3s
plot: [4,1] [========================>----------------------------------------] 39% est: 3s
plot: [4,2] [=========================>---------------------------------------] 41% est: 3s
plot: [4,3] [==========================>--------------------------------------] 42% est: 3s
plot: [4,4] [===========================>-------------------------------------] 44% est: 3s
plot: [4,5] [============================>------------------------------------] 45% est: 3s
plot: [4,6] [=============================>-----------------------------------] 47% est: 3s
plot: [4,7] [==============================>----------------------------------] 48% est: 3s
plot: [4,8] [===============================>---------------------------------] 50% est: 3s
plot: [5,1] [=================================>-------------------------------] 52% est: 3s
plot: [5,2] [==================================>------------------------------] 53% est: 3s
plot: [5,3] [===================================>-----------------------------] 55% est: 2s
plot: [5,4] [====================================>----------------------------] 56% est: 2s
plot: [5,5] [=====================================>---------------------------] 58% est: 2s
plot: [5,6] [======================================>--------------------------] 59% est: 2s
plot: [5,7] [=======================================>-------------------------] 61% est: 2s
plot: [5,8] [========================================>------------------------] 62% est: 2s
plot: [6,1] [=========================================>-----------------------] 64% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,2] [==========================================>----------------------] 66% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,3] [===========================================>---------------------] 67% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,4] [============================================>--------------------] 69% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,5] [=============================================>-------------------] 70% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,6] [==============================================>------------------] 72% est: 2s
plot: [6,7] [===============================================>-----------------] 73% est: 2s
plot: [6,8] [================================================>----------------] 75% est: 2s
plot: [7,1] [=================================================>---------------] 77% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7,2] [==================================================>--------------] 78% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7,3] [===================================================>-------------] 80% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7,4] [====================================================>------------] 81% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7,5] [=====================================================>-----------] 83% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7,6] [======================================================>----------] 84% est: 1s
plot: [7,7] [=======================================================>---------] 86% est: 1s
plot: [7,8] [========================================================>--------] 88% est: 1s
plot: [8,1] [=========================================================>-------] 89% est: 1s
plot: [8,2] [==========================================================>------] 91% est: 1s
plot: [8,3] [===========================================================>-----] 92% est: 1s
plot: [8,4] [============================================================>----] 94% est: 0s
plot: [8,5] [=============================================================>---] 95% est: 0s
plot: [8,6] [==============================================================>--] 97% est: 0s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [8,7] [===============================================================>-] 98% est: 0s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [8,8] [=================================================================]100% est: 0s
The next contender variables look to be x_large_bags, year and quarter. Let’s try them out.
model3a <- lm(average_price ~ type + region + x_large_bags, data = trimmed_avocados)
autoplot(model3a)
summary(model3a)
Call:
lm(formula = average_price ~ type + region + x_large_bags, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.10024 -0.16726 -0.01734 0.14591 1.51156
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.311e+00 1.489e-02 88.033 < 2e-16 ***
typeorganic 5.001e-01 4.101e-03 121.953 < 2e-16 ***
regionAtlanta -2.235e-01 2.086e-02 -10.718 < 2e-16 ***
regionBaltimoreWashington -2.713e-02 2.086e-02 -1.301 0.193298
regionBoise -2.128e-01 2.086e-02 -10.204 < 2e-16 ***
regionBoston -3.023e-02 2.086e-02 -1.449 0.147234
regionBuffaloRochester -4.428e-02 2.086e-02 -2.123 0.033774 *
regionCalifornia -1.762e-01 2.096e-02 -8.408 < 2e-16 ***
regionCharlotte 4.495e-02 2.086e-02 2.155 0.031177 *
regionChicago -4.936e-03 2.086e-02 -0.237 0.812924
regionCincinnatiDayton -3.523e-01 2.086e-02 -16.890 < 2e-16 ***
regionColumbus -3.086e-01 2.086e-02 -14.796 < 2e-16 ***
regionDallasFtWorth -4.762e-01 2.086e-02 -22.832 < 2e-16 ***
regionDenver -3.425e-01 2.086e-02 -16.420 < 2e-16 ***
regionDetroit -2.882e-01 2.087e-02 -13.810 < 2e-16 ***
regionGrandRapids -5.764e-02 2.086e-02 -2.763 0.005731 **
regionGreatLakes -2.353e-01 2.101e-02 -11.198 < 2e-16 ***
regionHarrisburgScranton -4.798e-02 2.086e-02 -2.300 0.021451 *
regionHartfordSpringfield 2.575e-01 2.086e-02 12.347 < 2e-16 ***
regionHouston -5.137e-01 2.086e-02 -24.628 < 2e-16 ***
regionIndianapolis -2.475e-01 2.086e-02 -11.867 < 2e-16 ***
regionJacksonville -5.021e-02 2.086e-02 -2.407 0.016074 *
regionLasVegas -1.801e-01 2.086e-02 -8.633 < 2e-16 ***
regionLosAngeles -3.532e-01 2.092e-02 -16.881 < 2e-16 ***
regionLouisville -2.745e-01 2.086e-02 -13.160 < 2e-16 ***
regionMiamiFtLauderdale -1.331e-01 2.086e-02 -6.380 1.81e-10 ***
regionMidsouth -1.590e-01 2.086e-02 -7.619 2.68e-14 ***
regionNashville -3.491e-01 2.086e-02 -16.736 < 2e-16 ***
regionNewOrleansMobile -2.572e-01 2.086e-02 -12.330 < 2e-16 ***
regionNewYork 1.659e-01 2.086e-02 7.954 1.91e-15 ***
regionNortheast 3.834e-02 2.086e-02 1.838 0.066151 .
regionNorthernNewEngland -8.377e-02 2.086e-02 -4.017 5.93e-05 ***
regionOrlando -5.523e-02 2.086e-02 -2.648 0.008111 **
regionPhiladelphia 7.097e-02 2.086e-02 3.403 0.000669 ***
regionPhoenixTucson -3.368e-01 2.086e-02 -16.149 < 2e-16 ***
regionPittsburgh -1.967e-01 2.086e-02 -9.433 < 2e-16 ***
regionPlains -1.267e-01 2.086e-02 -6.072 1.29e-09 ***
regionPortland -2.434e-01 2.086e-02 -11.669 < 2e-16 ***
regionRaleighGreensboro -6.021e-03 2.086e-02 -0.289 0.772828
regionRichmondNorfolk -2.699e-01 2.086e-02 -12.939 < 2e-16 ***
regionRoanoke -3.132e-01 2.086e-02 -15.015 < 2e-16 ***
regionSacramento 6.020e-02 2.086e-02 2.886 0.003904 **
regionSanDiego -1.631e-01 2.086e-02 -7.819 5.64e-15 ***
regionSanFrancisco 2.428e-01 2.086e-02 11.642 < 2e-16 ***
regionSeattle -1.185e-01 2.086e-02 -5.682 1.35e-08 ***
regionSouthCarolina -1.581e-01 2.086e-02 -7.581 3.59e-14 ***
regionSouthCentral -4.650e-01 2.088e-02 -22.268 < 2e-16 ***
regionSoutheast -1.680e-01 2.088e-02 -8.046 9.10e-16 ***
regionSpokane -1.154e-01 2.086e-02 -5.531 3.22e-08 ***
regionStLouis -1.308e-01 2.086e-02 -6.270 3.69e-10 ***
regionSyracuse -4.071e-02 2.086e-02 -1.952 0.050993 .
regionTampa -1.526e-01 2.086e-02 -7.315 2.68e-13 ***
regionTotalUS -2.852e-01 2.255e-02 -12.648 < 2e-16 ***
regionWest -2.904e-01 2.086e-02 -13.922 < 2e-16 ***
regionWestTexNewMexico -2.976e-01 2.090e-02 -14.238 < 2e-16 ***
x_large_bags 6.810e-07 1.351e-07 5.040 4.70e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2711 on 18193 degrees of freedom
Multiple R-squared: 0.548, Adjusted R-squared: 0.5466
F-statistic: 401 on 55 and 18193 DF, p-value: < 2.2e-16
model3b <- lm(average_price ~ type + region + year, data = trimmed_avocados)
autoplot(model3b)
summary(model3b)
Call:
lm(formula = average_price ~ type + region + year, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.1532 -0.1497 -0.0060 0.1419 1.4849
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.282672 0.014600 87.857 < 2e-16 ***
typeorganic 0.495933 0.003859 128.501 < 2e-16 ***
regionAtlanta -0.223077 0.020052 -11.125 < 2e-16 ***
regionBaltimoreWashington -0.026805 0.020052 -1.337 0.181322
regionBoise -0.212899 0.020052 -10.617 < 2e-16 ***
regionBoston -0.030148 0.020052 -1.503 0.132735
regionBuffaloRochester -0.044201 0.020052 -2.204 0.027515 *
regionCalifornia -0.165710 0.020052 -8.264 < 2e-16 ***
regionCharlotte 0.045000 0.020052 2.244 0.024835 *
regionChicago -0.004260 0.020052 -0.212 0.831748
regionCincinnatiDayton -0.351834 0.020052 -17.546 < 2e-16 ***
regionColumbus -0.308254 0.020052 -15.373 < 2e-16 ***
regionDallasFtWorth -0.475444 0.020052 -23.710 < 2e-16 ***
regionDenver -0.342456 0.020052 -17.078 < 2e-16 ***
regionDetroit -0.284941 0.020052 -14.210 < 2e-16 ***
regionGrandRapids -0.056036 0.020052 -2.794 0.005204 **
regionGreatLakes -0.222485 0.020052 -11.095 < 2e-16 ***
regionHarrisburgScranton -0.047751 0.020052 -2.381 0.017259 *
regionHartfordSpringfield 0.257604 0.020052 12.847 < 2e-16 ***
regionHouston -0.513107 0.020052 -25.589 < 2e-16 ***
regionIndianapolis -0.247041 0.020052 -12.320 < 2e-16 ***
regionJacksonville -0.050089 0.020052 -2.498 0.012501 *
regionLasVegas -0.180118 0.020052 -8.982 < 2e-16 ***
regionLosAngeles -0.345030 0.020052 -17.207 < 2e-16 ***
regionLouisville -0.274349 0.020052 -13.682 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.020052 -6.610 3.95e-11 ***
regionMidsouth -0.156272 0.020052 -7.793 6.88e-15 ***
regionNashville -0.348935 0.020052 -17.401 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.020052 -12.779 < 2e-16 ***
regionNewYork 0.166538 0.020052 8.305 < 2e-16 ***
regionNortheast 0.040888 0.020052 2.039 0.041459 *
regionNorthernNewEngland -0.083639 0.020052 -4.171 3.05e-05 ***
regionOrlando -0.054822 0.020052 -2.734 0.006263 **
regionPhiladelphia 0.071095 0.020052 3.545 0.000393 ***
regionPhoenixTucson -0.336598 0.020052 -16.786 < 2e-16 ***
regionPittsburgh -0.196716 0.020052 -9.810 < 2e-16 ***
regionPlains -0.124527 0.020052 -6.210 5.41e-10 ***
regionPortland -0.243314 0.020052 -12.134 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.020052 -0.295 0.767930
regionRichmondNorfolk -0.269704 0.020052 -13.450 < 2e-16 ***
regionRoanoke -0.313107 0.020052 -15.615 < 2e-16 ***
regionSacramento 0.060533 0.020052 3.019 0.002542 **
regionSanDiego -0.162870 0.020052 -8.122 4.86e-16 ***
regionSanFrancisco 0.243166 0.020052 12.127 < 2e-16 ***
regionSeattle -0.118462 0.020052 -5.908 3.53e-09 ***
regionSouthCarolina -0.157751 0.020052 -7.867 3.83e-15 ***
regionSouthCentral -0.459793 0.020052 -22.930 < 2e-16 ***
regionSoutheast -0.163018 0.020052 -8.130 4.58e-16 ***
regionSpokane -0.115444 0.020052 -5.757 8.69e-09 ***
regionStLouis -0.130414 0.020052 -6.504 8.04e-11 ***
regionSyracuse -0.040710 0.020052 -2.030 0.042350 *
regionTampa -0.152189 0.020052 -7.590 3.36e-14 ***
regionTotalUS -0.242012 0.020052 -12.069 < 2e-16 ***
regionWest -0.288817 0.020052 -14.403 < 2e-16 ***
regionWestTexNewMexico -0.296552 0.020097 -14.756 < 2e-16 ***
year2016 -0.036970 0.004920 -7.515 5.96e-14 ***
year2017 0.139555 0.004897 28.500 < 2e-16 ***
year2018 -0.028078 0.008033 -3.495 0.000475 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2607 on 18191 degrees of freedom
Multiple R-squared: 0.5822, Adjusted R-squared: 0.5809
F-statistic: 444.8 on 57 and 18191 DF, p-value: < 2.2e-16
model3c <- lm(average_price ~ type + region + quarter, data = trimmed_avocados)
autoplot(model3c)
summary(model3c)
Call:
lm(formula = average_price ~ type + region + quarter, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.06767 -0.15971 -0.01185 0.14629 1.54411
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.213689 0.014517 83.603 < 2e-16 ***
typeorganic 0.495911 0.003835 129.296 < 2e-16 ***
regionAtlanta -0.223077 0.019928 -11.194 < 2e-16 ***
regionBaltimoreWashington -0.026805 0.019928 -1.345 0.178619
regionBoise -0.212899 0.019928 -10.683 < 2e-16 ***
regionBoston -0.030148 0.019928 -1.513 0.130339
regionBuffaloRochester -0.044201 0.019928 -2.218 0.026565 *
regionCalifornia -0.165710 0.019928 -8.315 < 2e-16 ***
regionCharlotte 0.045000 0.019928 2.258 0.023950 *
regionChicago -0.004260 0.019928 -0.214 0.830716
regionCincinnatiDayton -0.351834 0.019928 -17.655 < 2e-16 ***
regionColumbus -0.308254 0.019928 -15.468 < 2e-16 ***
regionDallasFtWorth -0.475444 0.019928 -23.858 < 2e-16 ***
regionDenver -0.342456 0.019928 -17.185 < 2e-16 ***
regionDetroit -0.284941 0.019928 -14.298 < 2e-16 ***
regionGrandRapids -0.056036 0.019928 -2.812 0.004931 **
regionGreatLakes -0.222485 0.019928 -11.164 < 2e-16 ***
regionHarrisburgScranton -0.047751 0.019928 -2.396 0.016577 *
regionHartfordSpringfield 0.257604 0.019928 12.927 < 2e-16 ***
regionHouston -0.513107 0.019928 -25.748 < 2e-16 ***
regionIndianapolis -0.247041 0.019928 -12.397 < 2e-16 ***
regionJacksonville -0.050089 0.019928 -2.513 0.011963 *
regionLasVegas -0.180118 0.019928 -9.038 < 2e-16 ***
regionLosAngeles -0.345030 0.019928 -17.314 < 2e-16 ***
regionLouisville -0.274349 0.019928 -13.767 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.019928 -6.651 2.99e-11 ***
regionMidsouth -0.156272 0.019928 -7.842 4.69e-15 ***
regionNashville -0.348935 0.019928 -17.510 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.019928 -12.858 < 2e-16 ***
regionNewYork 0.166538 0.019928 8.357 < 2e-16 ***
regionNortheast 0.040888 0.019928 2.052 0.040208 *
regionNorthernNewEngland -0.083639 0.019928 -4.197 2.72e-05 ***
regionOrlando -0.054822 0.019928 -2.751 0.005947 **
regionPhiladelphia 0.071095 0.019928 3.568 0.000361 ***
regionPhoenixTucson -0.336598 0.019928 -16.891 < 2e-16 ***
regionPittsburgh -0.196716 0.019928 -9.871 < 2e-16 ***
regionPlains -0.124527 0.019928 -6.249 4.23e-10 ***
regionPortland -0.243314 0.019928 -12.210 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.019928 -0.297 0.766527
regionRichmondNorfolk -0.269704 0.019928 -13.534 < 2e-16 ***
regionRoanoke -0.313107 0.019928 -15.712 < 2e-16 ***
regionSacramento 0.060533 0.019928 3.038 0.002389 **
regionSanDiego -0.162870 0.019928 -8.173 3.21e-16 ***
regionSanFrancisco 0.243166 0.019928 12.202 < 2e-16 ***
regionSeattle -0.118462 0.019928 -5.944 2.82e-09 ***
regionSouthCarolina -0.157751 0.019928 -7.916 2.59e-15 ***
regionSouthCentral -0.459793 0.019928 -23.073 < 2e-16 ***
regionSoutheast -0.163018 0.019928 -8.180 3.02e-16 ***
regionSpokane -0.115444 0.019928 -5.793 7.03e-09 ***
regionStLouis -0.130414 0.019928 -6.544 6.14e-11 ***
regionSyracuse -0.040710 0.019928 -2.043 0.041082 *
regionTampa -0.152189 0.019928 -7.637 2.33e-14 ***
regionTotalUS -0.242012 0.019928 -12.144 < 2e-16 ***
regionWest -0.288817 0.019928 -14.493 < 2e-16 ***
regionWestTexNewMexico -0.297141 0.019973 -14.877 < 2e-16 ***
quarter2 0.068479 0.005303 12.912 < 2e-16 ***
quarter3 0.206308 0.005303 38.906 < 2e-16 ***
quarter4 0.152007 0.005265 28.869 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2591 on 18191 degrees of freedom
Multiple R-squared: 0.5874, Adjusted R-squared: 0.5861
F-statistic: 454.3 on 57 and 18191 DF, p-value: < 2.2e-16
So model3c with type, region and quarter wins out here. Everything still looks reasonable with the diagnostics, perhaps some mild heteroscedasticity.
Remember with two predictors, our R^2 variable was up at 0.5473. Now, with three predictors, we are at 0.5874. Ok, that seems reasonable as an improvement. So let’s see how much improvement we get by adding a fourth variable. Again, check the residuals to see which ones we should try add.
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model3c) %>%
select(-c("average_price", "type", "region", "quarter"))
ggpairs(avocados_remaining_resid) +
theme_grey(base_size = 8) # font size of labels
plot: [1,1] [>----------------------------------------------------------------] 2% est: 0s
plot: [1,2] [==>--------------------------------------------------------------] 4% est: 4s
plot: [1,3] [===>-------------------------------------------------------------] 6% est: 4s
plot: [1,4] [====>------------------------------------------------------------] 8% est: 5s
plot: [1,5] [======>----------------------------------------------------------] 10% est: 5s
plot: [1,6] [=======>---------------------------------------------------------] 12% est: 5s
plot: [1,7] [========>--------------------------------------------------------] 14% est: 5s
plot: [2,1] [==========>------------------------------------------------------] 16% est: 4s
plot: [2,2] [===========>-----------------------------------------------------] 18% est: 5s
plot: [2,3] [============>----------------------------------------------------] 20% est: 4s
plot: [2,4] [==============>--------------------------------------------------] 22% est: 4s
plot: [2,5] [===============>-------------------------------------------------] 24% est: 4s
plot: [2,6] [================>------------------------------------------------] 27% est: 4s
plot: [2,7] [==================>----------------------------------------------] 29% est: 4s
plot: [3,1] [===================>---------------------------------------------] 31% est: 4s
plot: [3,2] [====================>--------------------------------------------] 33% est: 4s
plot: [3,3] [======================>------------------------------------------] 35% est: 4s
plot: [3,4] [=======================>-----------------------------------------] 37% est: 4s
plot: [3,5] [========================>----------------------------------------] 39% est: 4s
plot: [3,6] [==========================>--------------------------------------] 41% est: 4s
plot: [3,7] [===========================>-------------------------------------] 43% est: 4s
plot: [4,1] [============================>------------------------------------] 45% est: 4s
plot: [4,2] [==============================>----------------------------------] 47% est: 4s
plot: [4,3] [===============================>---------------------------------] 49% est: 3s
plot: [4,4] [================================>--------------------------------] 51% est: 3s
plot: [4,5] [=================================>-------------------------------] 53% est: 3s
plot: [4,6] [===================================>-----------------------------] 55% est: 3s
plot: [4,7] [====================================>----------------------------] 57% est: 3s
plot: [5,1] [=====================================>---------------------------] 59% est: 3s
plot: [5,2] [=======================================>-------------------------] 61% est: 2s
plot: [5,3] [========================================>------------------------] 63% est: 2s
plot: [5,4] [=========================================>-----------------------] 65% est: 2s
plot: [5,5] [===========================================>---------------------] 67% est: 2s
plot: [5,6] [============================================>--------------------] 69% est: 2s
plot: [5,7] [=============================================>-------------------] 71% est: 2s
plot: [6,1] [===============================================>-----------------] 73% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,2] [================================================>----------------] 76% est: 2s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,3] [=================================================>---------------] 78% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,4] [===================================================>-------------] 80% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,5] [====================================================>------------] 82% est: 1s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6,6] [=====================================================>-----------] 84% est: 1s
plot: [6,7] [=======================================================>---------] 86% est: 1s
plot: [7,1] [========================================================>--------] 88% est: 1s
plot: [7,2] [=========================================================>-------] 90% est: 1s
plot: [7,3] [===========================================================>-----] 92% est: 1s
plot: [7,4] [============================================================>----] 94% est: 0s
plot: [7,5] [=============================================================>---] 96% est: 0s
plot: [7,6] [===============================================================>-] 98% est: 0s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7,7] [=================================================================]100% est: 0s
The contender variables here are x_large_bags and year, so let’s try them out.
model4a <- lm(average_price ~ type + region + quarter + x_large_bags, data = trimmed_avocados)
autoplot(model4a)
summary(model4a)
Call:
lm(formula = average_price ~ type + region + quarter + x_large_bags,
data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.06889 -0.16013 -0.01154 0.14553 1.54291
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.212e+00 1.451e-02 83.493 < 2e-16 ***
typeorganic 4.998e-01 3.916e-03 127.614 < 2e-16 ***
regionAtlanta -2.235e-01 1.992e-02 -11.222 < 2e-16 ***
regionBaltimoreWashington -2.711e-02 1.992e-02 -1.361 0.173535
regionBoise -2.128e-01 1.992e-02 -10.687 < 2e-16 ***
regionBoston -3.022e-02 1.992e-02 -1.518 0.129137
regionBuffaloRochester -4.427e-02 1.992e-02 -2.223 0.026233 *
regionCalifornia -1.753e-01 2.002e-02 -8.759 < 2e-16 ***
regionCharlotte 4.495e-02 1.992e-02 2.257 0.024015 *
regionChicago -4.877e-03 1.992e-02 -0.245 0.806549
regionCincinnatiDayton -3.522e-01 1.992e-02 -17.686 < 2e-16 ***
regionColumbus -3.086e-01 1.992e-02 -15.494 < 2e-16 ***
regionDallasFtWorth -4.762e-01 1.992e-02 -23.908 < 2e-16 ***
regionDenver -3.425e-01 1.992e-02 -17.196 < 2e-16 ***
regionDetroit -2.879e-01 1.993e-02 -14.449 < 2e-16 ***
regionGrandRapids -5.750e-02 1.992e-02 -2.887 0.003898 **
regionGreatLakes -2.342e-01 2.006e-02 -11.671 < 2e-16 ***
regionHarrisburgScranton -4.796e-02 1.992e-02 -2.408 0.016054 *
regionHartfordSpringfield 2.575e-01 1.992e-02 12.931 < 2e-16 ***
regionHouston -5.136e-01 1.992e-02 -25.789 < 2e-16 ***
regionIndianapolis -2.475e-01 1.992e-02 -12.426 < 2e-16 ***
regionJacksonville -5.020e-02 1.992e-02 -2.521 0.011720 *
regionLasVegas -1.801e-01 1.992e-02 -9.041 < 2e-16 ***
regionLosAngeles -3.524e-01 1.998e-02 -17.644 < 2e-16 ***
regionLouisville -2.745e-01 1.992e-02 -13.781 < 2e-16 ***
regionMiamiFtLauderdale -1.330e-01 1.992e-02 -6.679 2.47e-11 ***
regionMidsouth -1.587e-01 1.992e-02 -7.967 1.72e-15 ***
regionNashville -3.491e-01 1.992e-02 -17.527 < 2e-16 ***
regionNewOrleansMobile -2.571e-01 1.992e-02 -12.909 < 2e-16 ***
regionNewYork 1.660e-01 1.992e-02 8.333 < 2e-16 ***
regionNortheast 3.856e-02 1.992e-02 1.936 0.052939 .
regionNorthernNewEngland -8.376e-02 1.992e-02 -4.206 2.61e-05 ***
regionOrlando -5.519e-02 1.992e-02 -2.771 0.005592 **
regionPhiladelphia 7.098e-02 1.992e-02 3.564 0.000366 ***
regionPhoenixTucson -3.368e-01 1.992e-02 -16.911 < 2e-16 ***
regionPittsburgh -1.967e-01 1.992e-02 -9.879 < 2e-16 ***
regionPlains -1.265e-01 1.992e-02 -6.350 2.20e-10 ***
regionPortland -2.434e-01 1.992e-02 -12.220 < 2e-16 ***
regionRaleighGreensboro -6.012e-03 1.992e-02 -0.302 0.762753
regionRichmondNorfolk -2.699e-01 1.992e-02 -13.549 < 2e-16 ***
regionRoanoke -3.132e-01 1.992e-02 -15.725 < 2e-16 ***
regionSacramento 6.023e-02 1.992e-02 3.024 0.002497 **
regionSanDiego -1.631e-01 1.992e-02 -8.187 2.85e-16 ***
regionSanFrancisco 2.429e-01 1.992e-02 12.194 < 2e-16 ***
regionSeattle -1.185e-01 1.992e-02 -5.950 2.72e-09 ***
regionSouthCarolina -1.581e-01 1.992e-02 -7.938 2.18e-15 ***
regionSouthCentral -4.646e-01 1.994e-02 -23.297 < 2e-16 ***
regionSoutheast -1.676e-01 1.994e-02 -8.404 < 2e-16 ***
regionSpokane -1.154e-01 1.992e-02 -5.793 7.02e-09 ***
regionStLouis -1.307e-01 1.992e-02 -6.565 5.35e-11 ***
regionSyracuse -4.071e-02 1.992e-02 -2.044 0.040974 *
regionTampa -1.525e-01 1.992e-02 -7.659 1.96e-14 ***
regionTotalUS -2.814e-01 2.153e-02 -13.068 < 2e-16 ***
regionWest -2.903e-01 1.992e-02 -14.573 < 2e-16 ***
regionWestTexNewMexico -2.976e-01 1.996e-02 -14.910 < 2e-16 ***
quarter2 6.806e-02 5.301e-03 12.839 < 2e-16 ***
quarter3 2.055e-01 5.302e-03 38.761 < 2e-16 ***
quarter4 1.527e-01 5.264e-03 29.001 < 2e-16 ***
x_large_bags 6.215e-07 1.292e-07 4.810 1.52e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2589 on 18190 degrees of freedom
Multiple R-squared: 0.5879, Adjusted R-squared: 0.5866
F-statistic: 447.4 on 58 and 18190 DF, p-value: < 2.2e-16
model4b <- lm(average_price ~ type + region + quarter + year, data = trimmed_avocados)
autoplot(model4b)
summary(model4b)
Call:
lm(formula = average_price ~ type + region + quarter + year,
data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.03683 -0.14588 -0.00412 0.14386 1.43930
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.167184 0.014290 81.677 < 2e-16 ***
typeorganic 0.495930 0.003675 134.950 < 2e-16 ***
regionAtlanta -0.223077 0.019094 -11.683 < 2e-16 ***
regionBaltimoreWashington -0.026805 0.019094 -1.404 0.160383
regionBoise -0.212899 0.019094 -11.150 < 2e-16 ***
regionBoston -0.030148 0.019094 -1.579 0.114368
regionBuffaloRochester -0.044201 0.019094 -2.315 0.020627 *
regionCalifornia -0.165710 0.019094 -8.679 < 2e-16 ***
regionCharlotte 0.045000 0.019094 2.357 0.018445 *
regionChicago -0.004260 0.019094 -0.223 0.823439
regionCincinnatiDayton -0.351834 0.019094 -18.427 < 2e-16 ***
regionColumbus -0.308254 0.019094 -16.144 < 2e-16 ***
regionDallasFtWorth -0.475444 0.019094 -24.900 < 2e-16 ***
regionDenver -0.342456 0.019094 -17.935 < 2e-16 ***
regionDetroit -0.284941 0.019094 -14.923 < 2e-16 ***
regionGrandRapids -0.056036 0.019094 -2.935 0.003342 **
regionGreatLakes -0.222485 0.019094 -11.652 < 2e-16 ***
regionHarrisburgScranton -0.047751 0.019094 -2.501 0.012397 *
regionHartfordSpringfield 0.257604 0.019094 13.491 < 2e-16 ***
regionHouston -0.513107 0.019094 -26.873 < 2e-16 ***
regionIndianapolis -0.247041 0.019094 -12.938 < 2e-16 ***
regionJacksonville -0.050089 0.019094 -2.623 0.008716 **
regionLasVegas -0.180118 0.019094 -9.433 < 2e-16 ***
regionLosAngeles -0.345030 0.019094 -18.070 < 2e-16 ***
regionLouisville -0.274349 0.019094 -14.368 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.019094 -6.942 4.00e-12 ***
regionMidsouth -0.156272 0.019094 -8.184 2.91e-16 ***
regionNashville -0.348935 0.019094 -18.275 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.019094 -13.420 < 2e-16 ***
regionNewYork 0.166538 0.019094 8.722 < 2e-16 ***
regionNortheast 0.040888 0.019094 2.141 0.032255 *
regionNorthernNewEngland -0.083639 0.019094 -4.380 1.19e-05 ***
regionOrlando -0.054822 0.019094 -2.871 0.004094 **
regionPhiladelphia 0.071095 0.019094 3.723 0.000197 ***
regionPhoenixTucson -0.336598 0.019094 -17.629 < 2e-16 ***
regionPittsburgh -0.196716 0.019094 -10.303 < 2e-16 ***
regionPlains -0.124527 0.019094 -6.522 7.13e-11 ***
regionPortland -0.243314 0.019094 -12.743 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.019094 -0.310 0.756641
regionRichmondNorfolk -0.269704 0.019094 -14.125 < 2e-16 ***
regionRoanoke -0.313107 0.019094 -16.398 < 2e-16 ***
regionSacramento 0.060533 0.019094 3.170 0.001526 **
regionSanDiego -0.162870 0.019094 -8.530 < 2e-16 ***
regionSanFrancisco 0.243166 0.019094 12.735 < 2e-16 ***
regionSeattle -0.118462 0.019094 -6.204 5.62e-10 ***
regionSouthCarolina -0.157751 0.019094 -8.262 < 2e-16 ***
regionSouthCentral -0.459793 0.019094 -24.081 < 2e-16 ***
regionSoutheast -0.163018 0.019094 -8.538 < 2e-16 ***
regionSpokane -0.115444 0.019094 -6.046 1.51e-09 ***
regionStLouis -0.130414 0.019094 -6.830 8.75e-12 ***
regionSyracuse -0.040710 0.019094 -2.132 0.033011 *
regionTampa -0.152189 0.019094 -7.971 1.67e-15 ***
regionTotalUS -0.242012 0.019094 -12.675 < 2e-16 ***
regionWest -0.288817 0.019094 -15.126 < 2e-16 ***
regionWestTexNewMexico -0.296624 0.019137 -15.500 < 2e-16 ***
quarter2 0.081121 0.005410 14.996 < 2e-16 ***
quarter3 0.218901 0.005409 40.471 < 2e-16 ***
quarter4 0.161972 0.005376 30.130 < 2e-16 ***
year2016 -0.036978 0.004684 -7.894 3.10e-15 ***
year2017 0.138658 0.004663 29.735 < 2e-16 ***
year2018 0.087412 0.008334 10.488 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2482 on 18188 degrees of freedom
Multiple R-squared: 0.6213, Adjusted R-squared: 0.62
F-statistic: 497.3 on 60 and 18188 DF, p-value: < 2.2e-16
Hmm, model4b with type, region, quarter and year wins here. And it has improved our model performance from 0.5874 (with three predictors) to 0.6213. That’s quite good.
We are likely now pursuing variables with rather limited explanatory power, but let’s check for one more main effect, and see how much predictive power it gives us.
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model4b) %>%
select(-c("average_price", "type", "region", "quarter", "year"))
ggpairs(avocados_remaining_resid) +
theme_grey(base_size = 8) # font size of labels
plot: [1,1] [=>---------------------------------------------------------------] 3% est: 0s
plot: [1,2] [===>-------------------------------------------------------------] 6% est: 2s
plot: [1,3] [====>------------------------------------------------------------] 8% est: 2s
plot: [1,4] [======>----------------------------------------------------------] 11% est: 3s
plot: [1,5] [========>--------------------------------------------------------] 14% est: 2s
plot: [1,6] [==========>------------------------------------------------------] 17% est: 2s
plot: [2,1] [============>----------------------------------------------------] 19% est: 2s
plot: [2,2] [=============>---------------------------------------------------] 22% est: 2s
plot: [2,3] [===============>-------------------------------------------------] 25% est: 2s
plot: [2,4] [=================>-----------------------------------------------] 28% est: 2s
plot: [2,5] [===================>---------------------------------------------] 31% est: 2s
plot: [2,6] [=====================>-------------------------------------------] 33% est: 2s
plot: [3,1] [======================>------------------------------------------] 36% est: 2s
plot: [3,2] [========================>----------------------------------------] 39% est: 2s
plot: [3,3] [==========================>--------------------------------------] 42% est: 2s
plot: [3,4] [============================>------------------------------------] 44% est: 2s
plot: [3,5] [==============================>----------------------------------] 47% est: 2s
plot: [3,6] [===============================>---------------------------------] 50% est: 1s
plot: [4,1] [=================================>-------------------------------] 53% est: 1s
plot: [4,2] [===================================>-----------------------------] 56% est: 1s
plot: [4,3] [=====================================>---------------------------] 58% est: 1s
plot: [4,4] [=======================================>-------------------------] 61% est: 1s
plot: [4,5] [=========================================>-----------------------] 64% est: 1s
plot: [4,6] [==========================================>----------------------] 67% est: 1s
plot: [5,1] [============================================>--------------------] 69% est: 1s
plot: [5,2] [==============================================>------------------] 72% est: 1s
plot: [5,3] [================================================>----------------] 75% est: 1s
plot: [5,4] [==================================================>--------------] 78% est: 1s
plot: [5,5] [===================================================>-------------] 81% est: 1s
plot: [5,6] [=====================================================>-----------] 83% est: 0s
plot: [6,1] [=======================================================>---------] 86% est: 0s
plot: [6,2] [=========================================================>-------] 89% est: 0s
plot: [6,3] [===========================================================>-----] 92% est: 0s
plot: [6,4] [============================================================>----] 94% est: 0s
plot: [6,5] [==============================================================>--] 97% est: 0s
plot: [6,6] [=================================================================]100% est: 0s
It looks like x_large_bags is the remaining contender, let’s check it out!
model5 <- lm(average_price ~ type + region + quarter + year + x_large_bags, data = trimmed_avocados)
autoplot(model5)
summary(model5)
Call:
lm(formula = average_price ~ type + region + quarter + year +
x_large_bags, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.03610 -0.14545 -0.00439 0.14420 1.43907
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.167e+00 1.429e-02 81.687 < 2e-16 ***
typeorganic 4.982e-01 3.755e-03 132.674 < 2e-16 ***
regionAtlanta -2.233e-01 1.909e-02 -11.698 < 2e-16 ***
regionBaltimoreWashington -2.698e-02 1.909e-02 -1.413 0.157614
regionBoise -2.129e-01 1.909e-02 -11.151 < 2e-16 ***
regionBoston -3.019e-02 1.909e-02 -1.582 0.113769
regionBuffaloRochester -4.424e-02 1.909e-02 -2.318 0.020485 *
regionCalifornia -1.713e-01 1.919e-02 -8.925 < 2e-16 ***
regionCharlotte 4.497e-02 1.909e-02 2.356 0.018493 *
regionChicago -4.616e-03 1.909e-02 -0.242 0.808941
regionCincinnatiDayton -3.521e-01 1.909e-02 -18.442 < 2e-16 ***
regionColumbus -3.084e-01 1.909e-02 -16.157 < 2e-16 ***
regionDallasFtWorth -4.759e-01 1.909e-02 -24.926 < 2e-16 ***
regionDenver -3.425e-01 1.909e-02 -17.940 < 2e-16 ***
regionDetroit -2.866e-01 1.910e-02 -15.008 < 2e-16 ***
regionGrandRapids -5.688e-02 1.909e-02 -2.979 0.002894 **
regionGreatLakes -2.292e-01 1.923e-02 -11.918 < 2e-16 ***
regionHarrisburgScranton -4.787e-02 1.909e-02 -2.508 0.012166 *
regionHartfordSpringfield 2.576e-01 1.909e-02 13.492 < 2e-16 ***
regionHouston -5.134e-01 1.909e-02 -26.894 < 2e-16 ***
regionIndianapolis -2.473e-01 1.909e-02 -12.954 < 2e-16 ***
regionJacksonville -5.015e-02 1.909e-02 -2.627 0.008615 **
regionLasVegas -1.801e-01 1.909e-02 -9.434 < 2e-16 ***
regionLosAngeles -3.493e-01 1.915e-02 -18.243 < 2e-16 ***
regionLouisville -2.744e-01 1.909e-02 -14.375 < 2e-16 ***
regionMiamiFtLauderdale -1.328e-01 1.909e-02 -6.958 3.58e-12 ***
regionMidsouth -1.577e-01 1.910e-02 -8.257 < 2e-16 ***
regionNashville -3.490e-01 1.909e-02 -18.282 < 2e-16 ***
regionNewOrleansMobile -2.567e-01 1.909e-02 -13.448 < 2e-16 ***
regionNewYork 1.662e-01 1.909e-02 8.706 < 2e-16 ***
regionNortheast 3.955e-02 1.910e-02 2.071 0.038381 *
regionNorthernNewEngland -8.371e-02 1.909e-02 -4.385 1.17e-05 ***
regionOrlando -5.503e-02 1.909e-02 -2.883 0.003945 **
regionPhiladelphia 7.103e-02 1.909e-02 3.721 0.000199 ***
regionPhoenixTucson -3.367e-01 1.909e-02 -17.638 < 2e-16 ***
regionPittsburgh -1.967e-01 1.909e-02 -10.305 < 2e-16 ***
regionPlains -1.257e-01 1.909e-02 -6.581 4.80e-11 ***
regionPortland -2.434e-01 1.909e-02 -12.748 < 2e-16 ***
regionRaleighGreensboro -5.972e-03 1.909e-02 -0.313 0.754415
regionRichmondNorfolk -2.698e-01 1.909e-02 -14.132 < 2e-16 ***
regionRoanoke -3.131e-01 1.909e-02 -16.404 < 2e-16 ***
regionSacramento 6.036e-02 1.909e-02 3.162 0.001571 **
regionSanDiego -1.630e-01 1.909e-02 -8.537 < 2e-16 ***
regionSanFrancisco 2.430e-01 1.909e-02 12.728 < 2e-16 ***
regionSeattle -1.185e-01 1.909e-02 -6.207 5.52e-10 ***
regionSouthCarolina -1.579e-01 1.909e-02 -8.274 < 2e-16 ***
regionSouthCentral -4.625e-01 1.911e-02 -24.199 < 2e-16 ***
regionSoutheast -1.656e-01 1.911e-02 -8.667 < 2e-16 ***
regionSpokane -1.154e-01 1.909e-02 -6.045 1.52e-09 ***
regionStLouis -1.306e-01 1.909e-02 -6.842 8.08e-12 ***
regionSyracuse -4.071e-02 1.909e-02 -2.132 0.032984 *
regionTampa -1.524e-01 1.909e-02 -7.983 1.52e-15 ***
regionTotalUS -2.647e-01 2.066e-02 -12.815 < 2e-16 ***
regionWest -2.897e-01 1.909e-02 -15.171 < 2e-16 ***
regionWestTexNewMexico -2.969e-01 1.913e-02 -15.518 < 2e-16 ***
quarter2 8.058e-02 5.412e-03 14.891 < 2e-16 ***
quarter3 2.181e-01 5.414e-03 40.293 < 2e-16 ***
quarter4 1.621e-01 5.375e-03 30.154 < 2e-16 ***
year2016 -3.791e-02 4.695e-03 -8.075 7.16e-16 ***
year2017 1.375e-01 4.680e-03 29.381 < 2e-16 ***
year2018 8.547e-02 8.360e-03 10.223 < 2e-16 ***
x_large_bags 3.583e-07 1.246e-07 2.877 0.004025 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2482 on 18187 degrees of freedom
Multiple R-squared: 0.6214, Adjusted R-squared: 0.6202
F-statistic: 489.4 on 61 and 18187 DF, p-value: < 2.2e-16
Overall, we still have some heterscedasticity and deviations from normality in the residuals. In terms of our regression summary, it is a significant explanatory variable, and it is significant. But hmmm… with four predictors, our overall R^2 was 0.6213, and now with five we’ve only reached 0.6214. Given that there is no real increase in explanatory performance, even though it’s significant, we might want to remove it. Let’s do this now.
It’s also clear we aren’t gaining anything by adding predictors. The final thing we can do is test for interactions.
Let’s now think about possible pair interactions: for four main effect variables (type + region + quarter + year), so we have six possible pair interactions. Let’s test them out.
type:region type:quarter type:year region:quarter region:year quarter:year Let’s test these now:
model5pa <- lm(average_price ~ type + region + quarter + year + type:region, data = trimmed_avocados)
summary(model5pa)
Call:
lm(formula = average_price ~ type + region + quarter + year +
type:region, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.0082 -0.1335 -0.0024 0.1335 1.4799
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.202843 0.018542 64.870 < 2e-16 ***
typeorganic 0.424556 0.025580 16.597 < 2e-16 ***
regionAtlanta -0.279941 0.025580 -10.944 < 2e-16 ***
regionBaltimoreWashington -0.004556 0.025580 -0.178 0.858635
regionBoise -0.272722 0.025580 -10.661 < 2e-16 ***
regionBoston -0.044379 0.025580 -1.735 0.082778 .
regionBuffaloRochester 0.033550 0.025580 1.312 0.189681
regionCalifornia -0.243314 0.025580 -9.512 < 2e-16 ***
regionCharlotte -0.073669 0.025580 -2.880 0.003983 **
regionChicago 0.020592 0.025580 0.805 0.420838
regionCincinnatiDayton -0.333254 0.025580 -13.028 < 2e-16 ***
regionColumbus -0.282485 0.025580 -11.043 < 2e-16 ***
regionDallasFtWorth -0.502308 0.025580 -19.637 < 2e-16 ***
regionDenver -0.274793 0.025580 -10.742 < 2e-16 ***
regionDetroit -0.224793 0.025580 -8.788 < 2e-16 ***
regionGrandRapids -0.023728 0.025580 -0.928 0.353635
regionGreatLakes -0.166864 0.025580 -6.523 7.07e-11 ***
regionHarrisburgScranton -0.089941 0.025580 -3.516 0.000439 ***
regionHartfordSpringfield 0.059290 0.025580 2.318 0.020471 *
regionHouston -0.523669 0.025580 -20.472 < 2e-16 ***
regionIndianapolis -0.203905 0.025580 -7.971 1.66e-15 ***
regionJacksonville -0.155148 0.025580 -6.065 1.34e-09 ***
regionLasVegas -0.335799 0.025580 -13.127 < 2e-16 ***
regionLosAngeles -0.372308 0.025580 -14.555 < 2e-16 ***
regionLouisville -0.243432 0.025580 -9.516 < 2e-16 ***
regionMiamiFtLauderdale -0.094438 0.025580 -3.692 0.000223 ***
regionMidsouth -0.141598 0.025580 -5.535 3.15e-08 ***
regionNashville -0.335858 0.025580 -13.130 < 2e-16 ***
regionNewOrleansMobile -0.263491 0.025580 -10.301 < 2e-16 ***
regionNewYork 0.053373 0.025580 2.086 0.036948 *
regionNortheast -0.004320 0.025580 -0.169 0.865907
regionNorthernNewEngland -0.088521 0.025580 -3.461 0.000540 ***
regionOrlando -0.134320 0.025580 -5.251 1.53e-07 ***
regionPhiladelphia 0.047574 0.025580 1.860 0.062930 .
regionPhoenixTucson -0.620533 0.025580 -24.258 < 2e-16 ***
regionPittsburgh -0.098107 0.025580 -3.835 0.000126 ***
regionPlains -0.183254 0.025580 -7.164 8.14e-13 ***
regionPortland -0.302249 0.025580 -11.816 < 2e-16 ***
regionRaleighGreensboro -0.121657 0.025580 -4.756 1.99e-06 ***
regionRichmondNorfolk -0.228935 0.025580 -8.950 < 2e-16 ***
regionRoanoke -0.252722 0.025580 -9.880 < 2e-16 ***
regionSacramento -0.074793 0.025580 -2.924 0.003461 **
regionSanDiego -0.287278 0.025580 -11.230 < 2e-16 ***
regionSanFrancisco 0.048402 0.025580 1.892 0.058483 .
regionSeattle -0.178994 0.025580 -6.997 2.70e-12 ***
regionSouthCarolina -0.202544 0.025580 -7.918 2.55e-15 ***
regionSouthCentral -0.479349 0.025580 -18.739 < 2e-16 ***
regionSoutheast -0.185740 0.025580 -7.261 4.00e-13 ***
regionSpokane -0.232781 0.025580 -9.100 < 2e-16 ***
regionStLouis -0.163018 0.025580 -6.373 1.90e-10 ***
regionSyracuse 0.038166 0.025580 1.492 0.135716
regionTampa -0.147160 0.025580 -5.753 8.91e-09 ***
regionTotalUS -0.256746 0.025580 -10.037 < 2e-16 ***
regionWest -0.363669 0.025580 -14.217 < 2e-16 ***
regionWestTexNewMexico -0.506627 0.025580 -19.805 < 2e-16 ***
quarter2 0.081206 0.005125 15.846 < 2e-16 ***
quarter3 0.218901 0.005124 42.721 < 2e-16 ***
quarter4 0.162013 0.005092 31.814 < 2e-16 ***
year2016 -0.037010 0.004438 -8.340 < 2e-16 ***
year2017 0.138688 0.004417 31.396 < 2e-16 ***
year2018 0.087411 0.007895 11.071 < 2e-16 ***
typeorganic:regionAtlanta 0.113728 0.036176 3.144 0.001671 **
typeorganic:regionBaltimoreWashington -0.044497 0.036176 -1.230 0.218705
typeorganic:regionBoise 0.119645 0.036176 3.307 0.000944 ***
typeorganic:regionBoston 0.028462 0.036176 0.787 0.431435
typeorganic:regionBuffaloRochester -0.155503 0.036176 -4.299 1.73e-05 ***
typeorganic:regionCalifornia 0.155207 0.036176 4.290 1.79e-05 ***
typeorganic:regionCharlotte 0.237337 0.036176 6.561 5.50e-11 ***
typeorganic:regionChicago -0.049704 0.036176 -1.374 0.169471
typeorganic:regionCincinnatiDayton -0.037160 0.036176 -1.027 0.304341
typeorganic:regionColumbus -0.051538 0.036176 -1.425 0.154271
typeorganic:regionDallasFtWorth 0.053728 0.036176 1.485 0.137512
typeorganic:regionDenver -0.135325 0.036176 -3.741 0.000184 ***
typeorganic:regionDetroit -0.120296 0.036176 -3.325 0.000885 ***
typeorganic:regionGrandRapids -0.064615 0.036176 -1.786 0.074092 .
typeorganic:regionGreatLakes -0.111243 0.036176 -3.075 0.002108 **
typeorganic:regionHarrisburgScranton 0.084379 0.036176 2.332 0.019687 *
typeorganic:regionHartfordSpringfield 0.396627 0.036176 10.964 < 2e-16 ***
typeorganic:regionHouston 0.021124 0.036176 0.584 0.559273
typeorganic:regionIndianapolis -0.086272 0.036176 -2.385 0.017099 *
typeorganic:regionJacksonville 0.210118 0.036176 5.808 6.42e-09 ***
typeorganic:regionLasVegas 0.311361 0.036176 8.607 < 2e-16 ***
typeorganic:regionLosAngeles 0.054556 0.036176 1.508 0.131550
typeorganic:regionLouisville -0.061834 0.036176 -1.709 0.087418 .
typeorganic:regionMiamiFtLauderdale -0.076213 0.036176 -2.107 0.035154 *
typeorganic:regionMidsouth -0.029349 0.036176 -0.811 0.417210
typeorganic:regionNashville -0.026154 0.036176 -0.723 0.469711
typeorganic:regionNewOrleansMobile 0.014497 0.036176 0.401 0.688618
typeorganic:regionNewYork 0.226331 0.036176 6.256 4.03e-10 ***
typeorganic:regionNortheast 0.090414 0.036176 2.499 0.012453 *
typeorganic:regionNorthernNewEngland 0.009763 0.036176 0.270 0.787252
typeorganic:regionOrlando 0.158994 0.036176 4.395 1.11e-05 ***
typeorganic:regionPhiladelphia 0.047041 0.036176 1.300 0.193496
typeorganic:regionPhoenixTucson 0.567870 0.036176 15.697 < 2e-16 ***
typeorganic:regionPittsburgh -0.197219 0.036176 -5.452 5.05e-08 ***
typeorganic:regionPlains 0.117456 0.036176 3.247 0.001169 **
typeorganic:regionPortland 0.117870 0.036176 3.258 0.001123 **
typeorganic:regionRaleighGreensboro 0.231479 0.036176 6.399 1.61e-10 ***
typeorganic:regionRichmondNorfolk -0.081538 0.036176 -2.254 0.024211 *
typeorganic:regionRoanoke -0.120769 0.036176 -3.338 0.000844 ***
typeorganic:regionSacramento 0.270651 0.036176 7.482 7.68e-14 ***
typeorganic:regionSanDiego 0.248817 0.036176 6.878 6.27e-12 ***
typeorganic:regionSanFrancisco 0.389527 0.036176 10.768 < 2e-16 ***
typeorganic:regionSeattle 0.121065 0.036176 3.347 0.000820 ***
typeorganic:regionSouthCarolina 0.089586 0.036176 2.476 0.013281 *
typeorganic:regionSouthCentral 0.039112 0.036176 1.081 0.279633
typeorganic:regionSoutheast 0.045444 0.036176 1.256 0.209063
typeorganic:regionSpokane 0.234675 0.036176 6.487 8.98e-11 ***
typeorganic:regionStLouis 0.065207 0.036176 1.803 0.071483 .
typeorganic:regionSyracuse -0.157751 0.036176 -4.361 1.30e-05 ***
typeorganic:regionTampa -0.010059 0.036176 -0.278 0.780967
typeorganic:regionTotalUS 0.029467 0.036176 0.815 0.415334
typeorganic:regionWest 0.149704 0.036176 4.138 3.52e-05 ***
typeorganic:regionWestTexNewMexico 0.423157 0.036257 11.671 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2351 on 18135 degrees of freedom
Multiple R-squared: 0.6611, Adjusted R-squared: 0.659
F-statistic: 313.1 on 113 and 18135 DF, p-value: < 2.2e-16
model5pb <- lm(average_price ~ type + region + quarter + year + type:quarter, data = trimmed_avocados)
summary(model5pb)
Call:
lm(formula = average_price ~ type + region + quarter + year +
type:quarter, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.02358 -0.14643 -0.00311 0.14370 1.44227
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.180432 0.014545 81.158 < 2e-16 ***
typeorganic 0.469434 0.006682 70.256 < 2e-16 ***
regionAtlanta -0.223077 0.019073 -11.696 < 2e-16 ***
regionBaltimoreWashington -0.026805 0.019073 -1.405 0.159924
regionBoise -0.212899 0.019073 -11.162 < 2e-16 ***
regionBoston -0.030148 0.019073 -1.581 0.113971
regionBuffaloRochester -0.044201 0.019073 -2.317 0.020488 *
regionCalifornia -0.165710 0.019073 -8.688 < 2e-16 ***
regionCharlotte 0.045000 0.019073 2.359 0.018316 *
regionChicago -0.004260 0.019073 -0.223 0.823248
regionCincinnatiDayton -0.351834 0.019073 -18.447 < 2e-16 ***
regionColumbus -0.308254 0.019073 -16.162 < 2e-16 ***
regionDallasFtWorth -0.475444 0.019073 -24.928 < 2e-16 ***
regionDenver -0.342456 0.019073 -17.955 < 2e-16 ***
regionDetroit -0.284941 0.019073 -14.940 < 2e-16 ***
regionGrandRapids -0.056036 0.019073 -2.938 0.003308 **
regionGreatLakes -0.222485 0.019073 -11.665 < 2e-16 ***
regionHarrisburgScranton -0.047751 0.019073 -2.504 0.012301 *
regionHartfordSpringfield 0.257604 0.019073 13.506 < 2e-16 ***
regionHouston -0.513107 0.019073 -26.902 < 2e-16 ***
regionIndianapolis -0.247041 0.019073 -12.953 < 2e-16 ***
regionJacksonville -0.050089 0.019073 -2.626 0.008642 **
regionLasVegas -0.180118 0.019073 -9.444 < 2e-16 ***
regionLosAngeles -0.345030 0.019073 -18.090 < 2e-16 ***
regionLouisville -0.274349 0.019073 -14.384 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.019073 -6.949 3.79e-12 ***
regionMidsouth -0.156272 0.019073 -8.193 2.71e-16 ***
regionNashville -0.348935 0.019073 -18.295 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.019073 -13.435 < 2e-16 ***
regionNewYork 0.166538 0.019073 8.732 < 2e-16 ***
regionNortheast 0.040888 0.019073 2.144 0.032066 *
regionNorthernNewEngland -0.083639 0.019073 -4.385 1.17e-05 ***
regionOrlando -0.054822 0.019073 -2.874 0.004053 **
regionPhiladelphia 0.071095 0.019073 3.728 0.000194 ***
regionPhoenixTucson -0.336598 0.019073 -17.648 < 2e-16 ***
regionPittsburgh -0.196716 0.019073 -10.314 < 2e-16 ***
regionPlains -0.124527 0.019073 -6.529 6.80e-11 ***
regionPortland -0.243314 0.019073 -12.757 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.019073 -0.310 0.756382
regionRichmondNorfolk -0.269704 0.019073 -14.141 < 2e-16 ***
regionRoanoke -0.313107 0.019073 -16.416 < 2e-16 ***
regionSacramento 0.060533 0.019073 3.174 0.001507 **
regionSanDiego -0.162870 0.019073 -8.539 < 2e-16 ***
regionSanFrancisco 0.243166 0.019073 12.749 < 2e-16 ***
regionSeattle -0.118462 0.019073 -6.211 5.38e-10 ***
regionSouthCarolina -0.157751 0.019073 -8.271 < 2e-16 ***
regionSouthCentral -0.459793 0.019073 -24.107 < 2e-16 ***
regionSoutheast -0.163018 0.019073 -8.547 < 2e-16 ***
regionSpokane -0.115444 0.019073 -6.053 1.45e-09 ***
regionStLouis -0.130414 0.019073 -6.838 8.30e-12 ***
regionSyracuse -0.040710 0.019073 -2.134 0.032819 *
regionTampa -0.152189 0.019073 -7.979 1.56e-15 ***
regionTotalUS -0.242012 0.019073 -12.689 < 2e-16 ***
regionWest -0.288817 0.019073 -15.143 < 2e-16 ***
regionWestTexNewMexico -0.296626 0.019116 -15.518 < 2e-16 ***
quarter2 0.066217 0.007413 8.933 < 2e-16 ***
quarter3 0.186137 0.007413 25.110 < 2e-16 ***
quarter4 0.152474 0.007364 20.706 < 2e-16 ***
year2016 -0.036977 0.004679 -7.902 2.89e-15 ***
year2017 0.138659 0.004658 29.768 < 2e-16 ***
year2018 0.087412 0.008325 10.500 < 2e-16 ***
typeorganic:quarter2 0.029809 0.010152 2.936 0.003325 **
typeorganic:quarter3 0.065528 0.010150 6.456 1.10e-10 ***
typeorganic:quarter4 0.018995 0.010079 1.885 0.059501 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2479 on 18185 degrees of freedom
Multiple R-squared: 0.6222, Adjusted R-squared: 0.6209
F-statistic: 475.3 on 63 and 18185 DF, p-value: < 2.2e-16
model5pc <- lm(average_price ~ type + region + quarter + year + type:year, data = trimmed_avocados)
summary(model5pc)
Call:
lm(formula = average_price ~ type + region + quarter + year +
type:year, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.00911 -0.14461 -0.00436 0.13900 1.46703
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.117496 0.014421 77.493 < 2e-16 ***
typeorganic 0.595327 0.006565 90.688 < 2e-16 ***
regionAtlanta -0.223077 0.018919 -11.791 < 2e-16 ***
regionBaltimoreWashington -0.026805 0.018919 -1.417 0.156565
regionBoise -0.212899 0.018919 -11.253 < 2e-16 ***
regionBoston -0.030148 0.018919 -1.593 0.111069
regionBuffaloRochester -0.044201 0.018919 -2.336 0.019488 *
regionCalifornia -0.165710 0.018919 -8.759 < 2e-16 ***
regionCharlotte 0.045000 0.018919 2.379 0.017393 *
regionChicago -0.004260 0.018919 -0.225 0.821839
regionCincinnatiDayton -0.351834 0.018919 -18.596 < 2e-16 ***
regionColumbus -0.308254 0.018919 -16.293 < 2e-16 ***
regionDallasFtWorth -0.475444 0.018919 -25.130 < 2e-16 ***
regionDenver -0.342456 0.018919 -18.101 < 2e-16 ***
regionDetroit -0.284941 0.018919 -15.061 < 2e-16 ***
regionGrandRapids -0.056036 0.018919 -2.962 0.003063 **
regionGreatLakes -0.222485 0.018919 -11.760 < 2e-16 ***
regionHarrisburgScranton -0.047751 0.018919 -2.524 0.011613 *
regionHartfordSpringfield 0.257604 0.018919 13.616 < 2e-16 ***
regionHouston -0.513107 0.018919 -27.121 < 2e-16 ***
regionIndianapolis -0.247041 0.018919 -13.058 < 2e-16 ***
regionJacksonville -0.050089 0.018919 -2.647 0.008117 **
regionLasVegas -0.180118 0.018919 -9.520 < 2e-16 ***
regionLosAngeles -0.345030 0.018919 -18.237 < 2e-16 ***
regionLouisville -0.274349 0.018919 -14.501 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.018919 -7.006 2.54e-12 ***
regionMidsouth -0.156272 0.018919 -8.260 < 2e-16 ***
regionNashville -0.348935 0.018919 -18.443 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.018919 -13.544 < 2e-16 ***
regionNewYork 0.166538 0.018919 8.802 < 2e-16 ***
regionNortheast 0.040888 0.018919 2.161 0.030698 *
regionNorthernNewEngland -0.083639 0.018919 -4.421 9.89e-06 ***
regionOrlando -0.054822 0.018919 -2.898 0.003764 **
regionPhiladelphia 0.071095 0.018919 3.758 0.000172 ***
regionPhoenixTucson -0.336598 0.018919 -17.791 < 2e-16 ***
regionPittsburgh -0.196716 0.018919 -10.398 < 2e-16 ***
regionPlains -0.124527 0.018919 -6.582 4.77e-11 ***
regionPortland -0.243314 0.018919 -12.860 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.018919 -0.313 0.754471
regionRichmondNorfolk -0.269704 0.018919 -14.255 < 2e-16 ***
regionRoanoke -0.313107 0.018919 -16.549 < 2e-16 ***
regionSacramento 0.060533 0.018919 3.199 0.001379 **
regionSanDiego -0.162870 0.018919 -8.609 < 2e-16 ***
regionSanFrancisco 0.243166 0.018919 12.853 < 2e-16 ***
regionSeattle -0.118462 0.018919 -6.261 3.90e-10 ***
regionSouthCarolina -0.157751 0.018919 -8.338 < 2e-16 ***
regionSouthCentral -0.459793 0.018919 -24.303 < 2e-16 ***
regionSoutheast -0.163018 0.018919 -8.616 < 2e-16 ***
regionSpokane -0.115444 0.018919 -6.102 1.07e-09 ***
regionStLouis -0.130414 0.018919 -6.893 5.64e-12 ***
regionSyracuse -0.040710 0.018919 -2.152 0.031430 *
regionTampa -0.152189 0.018919 -8.044 9.22e-16 ***
regionTotalUS -0.242012 0.018919 -12.792 < 2e-16 ***
regionWest -0.288817 0.018919 -15.266 < 2e-16 ***
regionWestTexNewMexico -0.296641 0.018962 -15.644 < 2e-16 ***
quarter2 0.081108 0.005360 15.132 < 2e-16 ***
quarter3 0.218901 0.005359 40.844 < 2e-16 ***
quarter4 0.161984 0.005327 30.410 < 2e-16 ***
year2016 0.027632 0.006564 4.210 2.57e-05 ***
year2017 0.216048 0.006533 33.069 < 2e-16 ***
year2018 0.165421 0.011209 14.758 < 2e-16 ***
typeorganic:year2016 -0.129237 0.009283 -13.921 < 2e-16 ***
typeorganic:year2017 -0.154818 0.009240 -16.755 < 2e-16 ***
typeorganic:year2018 -0.156037 0.015159 -10.293 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.246 on 18185 degrees of freedom
Multiple R-squared: 0.6282, Adjusted R-squared: 0.6269
F-statistic: 487.7 on 63 and 18185 DF, p-value: < 2.2e-16
model5pd <- lm(average_price ~ type + region + quarter + year + region:quarter, data = trimmed_avocados)
summary(model5pd)
Call:
lm(formula = average_price ~ type + region + quarter + year +
region:quarter, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.06598 -0.14588 0.00059 0.14115 1.38051
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.216463 0.024241 50.182 < 2e-16 ***
typeorganic 0.495917 0.003583 138.408 < 2e-16 ***
regionAtlanta -0.257647 0.033888 -7.603 3.04e-14 ***
regionBaltimoreWashington -0.089804 0.033888 -2.650 0.008056 **
regionBoise -0.285392 0.033888 -8.422 < 2e-16 ***
regionBoston -0.007059 0.033888 -0.208 0.835000
regionBuffaloRochester -0.031078 0.033888 -0.917 0.359111
regionCalifornia -0.279706 0.033888 -8.254 < 2e-16 ***
regionCharlotte -0.021471 0.033888 -0.634 0.526370
regionChicago -0.073627 0.033888 -2.173 0.029820 *
regionCincinnatiDayton -0.434902 0.033888 -12.833 < 2e-16 ***
regionColumbus -0.324804 0.033888 -9.585 < 2e-16 ***
regionDallasFtWorth -0.484510 0.033888 -14.297 < 2e-16 ***
regionDenver -0.421569 0.033888 -12.440 < 2e-16 ***
regionDetroit -0.305000 0.033888 -9.000 < 2e-16 ***
regionGrandRapids -0.128235 0.033888 -3.784 0.000155 ***
regionGreatLakes -0.268137 0.033888 -7.912 2.67e-15 ***
regionHarrisburgScranton -0.060000 0.033888 -1.771 0.076657 .
regionHartfordSpringfield 0.229020 0.033888 6.758 1.44e-11 ***
regionHouston -0.537059 0.033888 -15.848 < 2e-16 ***
regionIndianapolis -0.273824 0.033888 -8.080 6.87e-16 ***
regionJacksonville -0.110392 0.033888 -3.258 0.001126 **
regionLasVegas -0.290686 0.033888 -8.578 < 2e-16 ***
regionLosAngeles -0.433039 0.033888 -12.778 < 2e-16 ***
regionLouisville -0.295490 0.033888 -8.720 < 2e-16 ***
regionMiamiFtLauderdale -0.111863 0.033888 -3.301 0.000966 ***
regionMidsouth -0.194510 0.033888 -5.740 9.64e-09 ***
regionNashville -0.351275 0.033888 -10.366 < 2e-16 ***
regionNewOrleansMobile -0.317255 0.033888 -9.362 < 2e-16 ***
regionNewYork 0.105098 0.033888 3.101 0.001930 **
regionNortheast 0.020000 0.033888 0.590 0.555082
regionNorthernNewEngland -0.059804 0.033888 -1.765 0.077625 .
regionOrlando -0.103431 0.033888 -3.052 0.002276 **
regionPhiladelphia 0.016569 0.033888 0.489 0.624905
regionPhoenixTucson -0.445294 0.033888 -13.140 < 2e-16 ***
regionPittsburgh -0.174510 0.033888 -5.150 2.64e-07 ***
regionPlains -0.184412 0.033888 -5.442 5.34e-08 ***
regionPortland -0.353235 0.033888 -10.424 < 2e-16 ***
regionRaleighGreensboro -0.058039 0.033888 -1.713 0.086792 .
regionRichmondNorfolk -0.263627 0.033888 -7.779 7.68e-15 ***
regionRoanoke -0.312255 0.033888 -9.214 < 2e-16 ***
regionSacramento -0.027059 0.033888 -0.798 0.424608
regionSanDiego -0.286667 0.033888 -8.459 < 2e-16 ***
regionSanFrancisco 0.090588 0.033888 2.673 0.007521 **
regionSeattle -0.258824 0.033888 -7.638 2.32e-14 ***
regionSouthCarolina -0.206961 0.033888 -6.107 1.04e-09 ***
regionSouthCentral -0.475686 0.033888 -14.037 < 2e-16 ***
regionSoutheast -0.207255 0.033888 -6.116 9.80e-10 ***
regionSpokane -0.269608 0.033888 -7.956 1.88e-15 ***
regionStLouis -0.190980 0.033888 -5.636 1.77e-08 ***
regionSyracuse -0.027647 0.033888 -0.816 0.414609
regionTampa -0.153235 0.033888 -4.522 6.17e-06 ***
regionTotalUS -0.290392 0.033888 -8.569 < 2e-16 ***
regionWest -0.389020 0.033888 -11.479 < 2e-16 ***
regionWestTexNewMexico -0.365980 0.033888 -10.800 < 2e-16 ***
quarter2 0.085685 0.036447 2.351 0.018736 *
quarter3 0.093249 0.036447 2.558 0.010521 *
quarter4 0.071967 0.036188 1.989 0.046752 *
year2016 -0.036996 0.004567 -8.100 5.83e-16 ***
year2017 0.138600 0.004546 30.485 < 2e-16 ***
year2018 0.087387 0.008126 10.754 < 2e-16 ***
regionAtlanta:quarter2 -0.088379 0.051480 -1.717 0.086041 .
regionBaltimoreWashington:quarter2 0.092368 0.051480 1.794 0.072790 .
regionBoise:quarter2 -0.095505 0.051480 -1.855 0.063585 .
regionBoston:quarter2 0.011418 0.051480 0.222 0.824479
regionBuffaloRochester:quarter2 0.081719 0.051480 1.587 0.112440
regionCalifornia:quarter2 0.003552 0.051480 0.069 0.944992
regionCharlotte:quarter2 0.062240 0.051480 1.209 0.226676
regionChicago:quarter2 -0.004193 0.051480 -0.081 0.935085
regionCincinnatiDayton:quarter2 0.010030 0.051480 0.195 0.845524
regionColumbus:quarter2 -0.094042 0.051480 -1.827 0.067751 .
regionDallasFtWorth:quarter2 -0.078439 0.051480 -1.524 0.127607
regionDenver:quarter2 -0.015739 0.051480 -0.306 0.759813
regionDetroit:quarter2 -0.036923 0.051480 -0.717 0.473241
regionGrandRapids:quarter2 0.135799 0.051480 2.638 0.008349 **
regionGreatLakes:quarter2 -0.011478 0.051480 -0.223 0.823567
regionHarrisburgScranton:quarter2 0.065513 0.051480 1.273 0.203181
regionHartfordSpringfield:quarter2 0.067262 0.051480 1.307 0.191375
regionHouston:quarter2 -0.089223 0.051480 -1.733 0.083084 .
regionIndianapolis:quarter2 -0.064253 0.051480 -1.248 0.212003
regionJacksonville:quarter2 0.028213 0.051480 0.548 0.583677
regionLasVegas:quarter2 -0.074314 0.051480 -1.444 0.148885
regionLosAngeles:quarter2 -0.060679 0.051480 -1.179 0.238540
regionLouisville:quarter2 -0.074510 0.051480 -1.447 0.147816
regionMiamiFtLauderdale:quarter2 -0.009676 0.051480 -0.188 0.850917
regionMidsouth:quarter2 -0.013952 0.051480 -0.271 0.786385
regionNashville:quarter2 -0.102572 0.051480 -1.992 0.046336 *
regionNewOrleansMobile:quarter2 0.083793 0.051480 1.628 0.103609
regionNewYork:quarter2 0.087722 0.051480 1.704 0.088397 .
regionNortheast:quarter2 0.056410 0.051480 1.096 0.273195
regionNorthernNewEngland:quarter2 -0.067632 0.051480 -1.314 0.188947
regionOrlando:quarter2 0.018047 0.051480 0.351 0.725924
regionPhiladelphia:quarter2 0.109970 0.051480 2.136 0.032680 *
regionPhoenixTucson:quarter2 -0.020090 0.051480 -0.390 0.696351
regionPittsburgh:quarter2 -0.038054 0.051480 -0.739 0.459792
regionPlains:quarter2 -0.002896 0.051480 -0.056 0.955141
regionPortland:quarter2 -0.045354 0.051480 -0.881 0.378324
regionRaleighGreensboro:quarter2 0.001885 0.051480 0.037 0.970786
regionRichmondNorfolk:quarter2 -0.113552 0.051480 -2.206 0.027414 *
regionRoanoke:quarter2 -0.131207 0.051480 -2.549 0.010821 *
regionSacramento:quarter2 0.084238 0.051480 1.636 0.101788
regionSanDiego:quarter2 -0.003333 0.051480 -0.065 0.948374
regionSanFrancisco:quarter2 0.121976 0.051480 2.369 0.017828 *
regionSeattle:quarter2 0.012029 0.051480 0.234 0.815254
regionSouthCarolina:quarter2 0.027602 0.051480 0.536 0.591851
regionSouthCentral:quarter2 -0.072262 0.051480 -1.404 0.160426
regionSoutheast:quarter2 -0.005950 0.051480 -0.116 0.907984
regionSpokane:quarter2 0.009736 0.051480 0.189 0.849999
regionStLouis:quarter2 0.057006 0.051480 1.107 0.268161
regionSyracuse:quarter2 0.064955 0.051480 1.262 0.207057
regionTampa:quarter2 0.006056 0.051480 0.118 0.906359
regionTotalUS:quarter2 -0.009223 0.051480 -0.179 0.857813
regionWest:quarter2 -0.029186 0.051480 -0.567 0.570770
regionWestTexNewMexico:quarter2 -0.096213 0.051672 -1.862 0.062620 .
regionAtlanta:quarter3 0.122391 0.051480 2.377 0.017444 *
regionBaltimoreWashington:quarter3 0.095830 0.051480 1.861 0.062691 .
regionBoise:quarter3 0.251931 0.051480 4.894 9.98e-07 ***
regionBoston:quarter3 -0.001146 0.051480 -0.022 0.982235
regionBuffaloRochester:quarter3 -0.034050 0.051480 -0.661 0.508354
regionCalifornia:quarter3 0.255860 0.051480 4.970 6.75e-07 ***
regionCharlotte:quarter3 0.139804 0.051480 2.716 0.006620 **
regionChicago:quarter3 0.174012 0.051480 3.380 0.000726 ***
regionCincinnatiDayton:quarter3 0.212594 0.051480 4.130 3.65e-05 ***
regionColumbus:quarter3 0.109291 0.051480 2.123 0.033769 *
regionDallasFtWorth:quarter3 0.023228 0.051480 0.451 0.651852
regionDenver:quarter3 0.212466 0.051480 4.127 3.69e-05 ***
regionDetroit:quarter3 0.054872 0.051480 1.066 0.286490
regionGrandRapids:quarter3 0.091440 0.051480 1.776 0.075712 .
regionGreatLakes:quarter3 0.123522 0.051480 2.399 0.016432 *
regionHarrisburgScranton:quarter3 0.006795 0.051480 0.132 0.894993
regionHartfordSpringfield:quarter3 0.049442 0.051480 0.960 0.336862
regionHouston:quarter3 0.072059 0.051480 1.400 0.161608
regionIndianapolis:quarter3 0.092157 0.051480 1.790 0.073447 .
regionJacksonville:quarter3 0.168213 0.051480 3.268 0.001087 **
regionLasVegas:quarter3 0.295302 0.051480 5.736 9.84e-09 ***
regionLosAngeles:quarter3 0.214578 0.051480 4.168 3.08e-05 ***
regionLouisville:quarter3 0.084721 0.051480 1.646 0.099842 .
regionMiamiFtLauderdale:quarter3 -0.072240 0.051480 -1.403 0.160557
regionMidsouth:quarter3 0.095407 0.051480 1.853 0.063858 .
regionNashville:quarter3 0.041531 0.051480 0.807 0.419828
regionNewOrleansMobile:quarter3 0.071357 0.051480 1.386 0.165728
regionNewYork:quarter3 0.112338 0.051480 2.182 0.029110 *
regionNortheast:quarter3 0.050256 0.051480 0.976 0.328963
regionNorthernNewEngland:quarter3 -0.013658 0.051480 -0.265 0.790782
regionOrlando:quarter3 0.116252 0.051480 2.258 0.023946 *
regionPhiladelphia:quarter3 0.082149 0.051480 1.596 0.110562
regionPhoenixTucson:quarter3 0.260038 0.051480 5.051 4.43e-07 ***
regionPittsburgh:quarter3 -0.016131 0.051480 -0.313 0.754019
regionPlains:quarter3 0.136335 0.051480 2.648 0.008097 **
regionPortland:quarter3 0.334261 0.051480 6.493 8.63e-11 ***
regionRaleighGreensboro:quarter3 0.121373 0.051480 2.358 0.018401 *
regionRichmondNorfolk:quarter3 0.051576 0.051480 1.002 0.316421
regionRoanoke:quarter3 0.090460 0.051480 1.757 0.078903 .
regionSacramento:quarter3 0.181161 0.051480 3.519 0.000434 ***
regionSanDiego:quarter3 0.280385 0.051480 5.446 5.21e-08 ***
regionSanFrancisco:quarter3 0.312360 0.051480 6.068 1.32e-09 ***
regionSeattle:quarter3 0.392029 0.051480 7.615 2.76e-14 ***
regionSouthCarolina:quarter3 0.102345 0.051480 1.988 0.046820 *
regionSouthCentral:quarter3 0.042609 0.051480 0.828 0.407859
regionSoutheast:quarter3 0.111357 0.051480 2.163 0.030545 *
regionSpokane:quarter3 0.393582 0.051480 7.645 2.19e-14 ***
regionStLouis:quarter3 0.192134 0.051480 3.732 0.000190 ***
regionSyracuse:quarter3 -0.036840 0.051480 -0.716 0.474236
regionTampa:quarter3 -0.043047 0.051480 -0.836 0.403063
regionTotalUS:quarter3 0.104751 0.051480 2.035 0.041887 *
regionWest:quarter3 0.297609 0.051480 5.781 7.55e-09 ***
regionWestTexNewMexico:quarter3 0.178160 0.051480 3.461 0.000540 ***
regionAtlanta:quarter4 0.112897 0.051114 2.209 0.027206 *
regionBaltimoreWashington:quarter4 0.082679 0.051114 1.618 0.105780
regionBoise:quarter4 0.153767 0.051114 3.008 0.002631 **
regionBoston:quarter4 -0.107566 0.051114 -2.104 0.035355 *
regionBuffaloRochester:quarter4 -0.101922 0.051114 -1.994 0.046167 *
regionCalifornia:quarter4 0.228706 0.051114 4.474 7.71e-06 ***
regionCharlotte:quarter4 0.083846 0.051114 1.640 0.100948
regionChicago:quarter4 0.127502 0.051114 2.494 0.012624 *
regionCincinnatiDayton:quarter4 0.133902 0.051114 2.620 0.008809 **
regionColumbus:quarter4 0.055054 0.051114 1.077 0.281460
regionDallasFtWorth:quarter4 0.092135 0.051114 1.803 0.071479 .
regionDenver:quarter4 0.142444 0.051114 2.787 0.005329 **
regionDetroit:quarter4 0.067250 0.051114 1.316 0.188297
regionGrandRapids:quarter4 0.083485 0.051114 1.633 0.102421
regionGreatLakes:quarter4 0.083637 0.051114 1.636 0.101797
regionHarrisburgScranton:quarter4 -0.018750 0.051114 -0.367 0.713753
regionHartfordSpringfield:quarter4 0.006980 0.051114 0.137 0.891376
regionHouston:quarter4 0.117934 0.051114 2.307 0.021051 *
regionIndianapolis:quarter4 0.085949 0.051114 1.682 0.092683 .
regionJacksonville:quarter4 0.063267 0.051114 1.238 0.215820
regionLasVegas:quarter4 0.251686 0.051114 4.924 8.55e-07 ***
regionLosAngeles:quarter4 0.221789 0.051114 4.339 1.44e-05 ***
regionLouisville:quarter4 0.079365 0.051114 1.553 0.120511
regionMiamiFtLauderdale:quarter4 -0.007512 0.051114 -0.147 0.883157
regionMidsouth:quarter4 0.082135 0.051114 1.607 0.108096
regionNashville:quarter4 0.069400 0.051114 1.358 0.174564
regionNewOrleansMobile:quarter4 0.106505 0.051114 2.084 0.037204 *
regionNewYork:quarter4 0.064527 0.051114 1.262 0.206818
regionNortheast:quarter4 -0.015750 0.051114 -0.308 0.757984
regionNorthernNewEngland:quarter4 -0.021446 0.051114 -0.420 0.674803
regionOrlando:quarter4 0.074431 0.051114 1.456 0.145360
regionPhiladelphia:quarter4 0.043056 0.051114 0.842 0.399599
regionPhoenixTucson:quarter4 0.225294 0.051114 4.408 1.05e-05 ***
[ reached getOption("max.print") -- omitted 20 rows ]
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.242 on 18029 degrees of freedom
Multiple R-squared: 0.6431, Adjusted R-squared: 0.6388
F-statistic: 148.4 on 219 and 18029 DF, p-value: < 2.2e-16
model5pe <- lm(average_price ~ type + region + quarter + year + region:year, data = trimmed_avocados)
summary(model5pe)
Call:
lm(formula = average_price ~ type + region + quarter + year +
region:year, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-1.03093 -0.14190 -0.00143 0.13797 1.38892
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.175e+00 2.396e-02 49.047 < 2e-16 ***
typeorganic 4.959e-01 3.575e-03 138.719 < 2e-16 ***
regionAtlanta -1.582e-01 3.349e-02 -4.724 2.33e-06 ***
regionBaltimoreWashington -1.699e-01 3.349e-02 -5.074 3.94e-07 ***
regionBoise -1.650e-01 3.349e-02 -4.927 8.40e-07 ***
regionBoston -6.519e-02 3.349e-02 -1.947 0.051566 .
regionBuffaloRochester 5.865e-03 3.349e-02 0.175 0.860955
regionCalifornia -2.229e-01 3.349e-02 -6.656 2.89e-11 ***
regionCharlotte 3.702e-02 3.349e-02 1.106 0.268948
regionChicago -1.347e-01 3.349e-02 -4.023 5.77e-05 ***
regionCincinnatiDayton -3.364e-01 3.349e-02 -10.047 < 2e-16 ***
regionColumbus -2.649e-01 3.349e-02 -7.911 2.70e-15 ***
regionDallasFtWorth -4.609e-01 3.349e-02 -13.763 < 2e-16 ***
regionDenver -3.510e-01 3.349e-02 -10.481 < 2e-16 ***
regionDetroit -2.005e-01 3.349e-02 -5.987 2.18e-09 ***
regionGrandRapids -1.224e-01 3.349e-02 -3.655 0.000258 ***
regionGreatLakes -2.125e-01 3.349e-02 -6.346 2.26e-10 ***
regionHarrisburgScranton -6.712e-02 3.349e-02 -2.004 0.045053 *
regionHartfordSpringfield 2.090e-01 3.349e-02 6.243 4.40e-10 ***
regionHouston -4.907e-01 3.349e-02 -14.653 < 2e-16 ***
regionIndianapolis -1.958e-01 3.349e-02 -5.846 5.11e-09 ***
regionJacksonville -3.567e-02 3.349e-02 -1.065 0.286744
regionLasVegas -1.699e-01 3.349e-02 -5.074 3.94e-07 ***
regionLosAngeles -3.862e-01 3.349e-02 -11.535 < 2e-16 ***
regionLouisville -2.443e-01 3.349e-02 -7.296 3.08e-13 ***
regionMiamiFtLauderdale -1.552e-01 3.349e-02 -4.635 3.60e-06 ***
regionMidsouth -1.874e-01 3.349e-02 -5.597 2.22e-08 ***
regionNashville -2.615e-01 3.349e-02 -7.810 6.01e-15 ***
regionNewOrleansMobile -2.711e-01 3.349e-02 -8.095 6.10e-16 ***
regionNewYork 1.058e-01 3.349e-02 3.159 0.001588 **
regionNortheast 5.000e-03 3.349e-02 0.149 0.881305
regionNorthernNewEngland -6.538e-02 3.349e-02 -1.953 0.050881 .
regionOrlando -3.942e-02 3.349e-02 -1.177 0.239087
regionPhiladelphia 1.644e-02 3.349e-02 0.491 0.623415
regionPhoenixTucson -3.816e-01 3.349e-02 -11.397 < 2e-16 ***
regionPittsburgh -1.315e-01 3.349e-02 -3.928 8.59e-05 ***
regionPlains -1.009e-01 3.349e-02 -3.012 0.002597 **
regionPortland -2.319e-01 3.349e-02 -6.926 4.47e-12 ***
regionRaleighGreensboro -8.933e-02 3.349e-02 -2.668 0.007646 **
regionRichmondNorfolk -2.642e-01 3.349e-02 -7.891 3.17e-15 ***
regionRoanoke -3.116e-01 3.349e-02 -9.306 < 2e-16 ***
regionSacramento -8.471e-02 3.349e-02 -2.530 0.011422 *
regionSanDiego -2.645e-01 3.349e-02 -7.899 2.96e-15 ***
regionSanFrancisco 8.231e-02 3.349e-02 2.458 0.013981 *
regionSeattle -1.165e-01 3.349e-02 -3.480 0.000502 ***
regionSouthCarolina -8.404e-02 3.349e-02 -2.510 0.012093 *
regionSouthCentral -4.267e-01 3.349e-02 -12.744 < 2e-16 ***
regionSoutheast -1.240e-01 3.349e-02 -3.704 0.000213 ***
regionSpokane -1.384e-01 3.349e-02 -4.132 3.61e-05 ***
regionStLouis -3.538e-02 3.349e-02 -1.057 0.290659
regionSyracuse -9.712e-03 3.349e-02 -0.290 0.771804
regionTampa -1.821e-01 3.349e-02 -5.439 5.44e-08 ***
regionTotalUS -2.813e-01 3.349e-02 -8.402 < 2e-16 ***
regionWest -3.010e-01 3.349e-02 -8.988 < 2e-16 ***
regionWestTexNewMexico -2.766e-01 3.357e-02 -8.239 < 2e-16 ***
quarter2 8.108e-02 5.262e-03 15.407 < 2e-16 ***
quarter3 2.189e-01 5.262e-03 41.602 < 2e-16 ***
quarter4 1.620e-01 5.229e-03 30.974 < 2e-16 ***
year2016 -4.808e-03 3.349e-02 -0.144 0.885838
year2017 9.820e-02 3.333e-02 2.947 0.003217 **
year2018 1.257e-02 5.478e-02 0.230 0.818454
regionAtlanta:year2016 -1.616e-01 4.736e-02 -3.413 0.000643 ***
regionBaltimoreWashington:year2016 2.236e-01 4.736e-02 4.721 2.37e-06 ***
regionBoise:year2016 -2.270e-01 4.736e-02 -4.794 1.65e-06 ***
regionBoston:year2016 -4.260e-02 4.736e-02 -0.899 0.368404
regionBuffaloRochester:year2016 -5.596e-02 4.736e-02 -1.182 0.237332
regionCalifornia:year2016 1.885e-02 4.736e-02 0.398 0.690658
regionCharlotte:year2016 -7.308e-02 4.736e-02 -1.543 0.122814
regionChicago:year2016 1.481e-01 4.736e-02 3.127 0.001769 **
regionCincinnatiDayton:year2016 -1.091e-01 4.736e-02 -2.305 0.021203 *
regionColumbus:year2016 -8.269e-02 4.736e-02 -1.746 0.080796 .
regionDallasFtWorth:year2016 -7.692e-02 4.736e-02 -1.624 0.104317
regionDenver:year2016 -8.981e-02 4.736e-02 -1.896 0.057918 .
regionDetroit:year2016 -1.611e-01 4.736e-02 -3.401 0.000673 ***
regionGrandRapids:year2016 9.779e-02 4.736e-02 2.065 0.038940 *
regionGreatLakes:year2016 -4.442e-02 4.736e-02 -0.938 0.348222
regionHarrisburgScranton:year2016 4.481e-02 4.736e-02 0.946 0.344065
regionHartfordSpringfield:year2016 1.081e-01 4.736e-02 2.282 0.022488 *
regionHouston:year2016 -5.135e-02 4.736e-02 -1.084 0.278264
regionIndianapolis:year2016 -3.663e-02 4.736e-02 -0.774 0.439177
regionJacksonville:year2016 -1.306e-01 4.736e-02 -2.757 0.005833 **
regionLasVegas:year2016 -1.163e-02 4.736e-02 -0.246 0.805929
regionLosAngeles:year2016 -6.394e-02 4.736e-02 -1.350 0.176953
regionLouisville:year2016 -7.808e-02 4.736e-02 -1.649 0.099221 .
regionMiamiFtLauderdale:year2016 -9.894e-02 4.736e-02 -2.089 0.036692 *
regionMidsouth:year2016 4.327e-03 4.736e-02 0.091 0.927199
regionNashville:year2016 -1.562e-01 4.736e-02 -3.299 0.000971 ***
regionNewOrleansMobile:year2016 -1.423e-02 4.736e-02 -0.301 0.763794
regionNewYork:year2016 1.223e-01 4.736e-02 2.583 0.009810 **
regionNortheast:year2016 5.673e-02 4.736e-02 1.198 0.230946
regionNorthernNewEngland:year2016 -7.587e-02 4.736e-02 -1.602 0.109168
regionOrlando:year2016 -1.237e-01 4.736e-02 -2.613 0.008978 **
regionPhiladelphia:year2016 1.244e-01 4.736e-02 2.627 0.008611 **
regionPhoenixTucson:year2016 1.064e-01 4.736e-02 2.248 0.024607 *
regionPittsburgh:year2016 -5.904e-02 4.736e-02 -1.247 0.212525
regionPlains:year2016 -5.558e-02 4.736e-02 -1.174 0.240571
regionPortland:year2016 -1.104e-01 4.736e-02 -2.331 0.019767 *
regionRaleighGreensboro:year2016 3.173e-03 4.736e-02 0.067 0.946579
regionRichmondNorfolk:year2016 -5.856e-02 4.736e-02 -1.237 0.216273
regionRoanoke:year2016 -7.481e-02 4.736e-02 -1.580 0.114195
regionSacramento:year2016 2.189e-01 4.736e-02 4.623 3.80e-06 ***
regionSanDiego:year2016 4.433e-02 4.736e-02 0.936 0.349267
regionSanFrancisco:year2016 2.650e-01 4.736e-02 5.596 2.23e-08 ***
regionSeattle:year2016 -1.171e-01 4.736e-02 -2.473 0.013404 *
regionSouthCarolina:year2016 -1.449e-01 4.736e-02 -3.060 0.002217 **
regionSouthCentral:year2016 -8.029e-02 4.736e-02 -1.695 0.090012 .
regionSoutheast:year2016 -1.230e-01 4.736e-02 -2.597 0.009413 **
regionSpokane:year2016 -6.202e-02 4.736e-02 -1.310 0.190334
regionStLouis:year2016 -3.131e-01 4.736e-02 -6.611 3.92e-11 ***
regionSyracuse:year2016 -2.077e-02 4.736e-02 -0.439 0.660973
regionTampa:year2016 -8.731e-02 4.736e-02 -1.844 0.065251 .
regionTotalUS:year2016 1.096e-02 4.736e-02 0.231 0.816951
regionWest:year2016 -5.212e-02 4.736e-02 -1.101 0.271127
regionWestTexNewMexico:year2016 -1.074e-02 4.741e-02 -0.226 0.820854
regionAtlanta:year2017 -5.088e-02 4.713e-02 -1.080 0.280337
regionBaltimoreWashington:year2017 2.115e-01 4.713e-02 4.488 7.25e-06 ***
regionBoise:year2017 1.981e-02 4.713e-02 0.420 0.674245
regionBoston:year2017 1.069e-01 4.713e-02 2.268 0.023347 *
regionBuffaloRochester:year2017 -5.596e-02 4.713e-02 -1.187 0.235126
regionCalifornia:year2017 1.189e-01 4.713e-02 2.523 0.011639 *
regionCharlotte:year2017 9.496e-02 4.713e-02 2.015 0.043940 *
regionChicago:year2017 2.117e-01 4.713e-02 4.491 7.12e-06 ***
regionCincinnatiDayton:year2017 1.805e-02 4.713e-02 0.383 0.701811
regionColumbus:year2017 -5.727e-02 4.713e-02 -1.215 0.224378
regionDallasFtWorth:year2017 1.633e-05 4.713e-02 0.000 0.999724
regionDenver:year2017 7.087e-02 4.713e-02 1.504 0.132705
regionDetroit:year2017 -9.829e-02 4.713e-02 -2.085 0.037040 *
regionGrandRapids:year2017 1.123e-01 4.713e-02 2.383 0.017189 *
regionGreatLakes:year2017 -7.076e-04 4.713e-02 -0.015 0.988023
regionHarrisburgScranton:year2017 2.504e-02 4.713e-02 0.531 0.595237
regionHartfordSpringfield:year2017 4.143e-02 4.713e-02 0.879 0.379365
regionHouston:year2017 -4.310e-02 4.713e-02 -0.914 0.360486
regionIndianapolis:year2017 -1.113e-01 4.713e-02 -2.362 0.018208 *
regionJacksonville:year2017 6.935e-02 4.713e-02 1.471 0.141188
regionLasVegas:year2017 -5.010e-02 4.713e-02 -1.063 0.287846
regionLosAngeles:year2017 1.258e-01 4.713e-02 2.669 0.007623 **
regionLouisville:year2017 -3.643e-02 4.713e-02 -0.773 0.439599
regionMiamiFtLauderdale:year2017 1.549e-01 4.713e-02 3.287 0.001016 **
regionMidsouth:year2017 7.014e-02 4.713e-02 1.488 0.136728
regionNashville:year2017 -1.363e-01 4.713e-02 -2.892 0.003836 **
regionNewOrleansMobile:year2017 5.228e-02 4.713e-02 1.109 0.267311
regionNewYork:year2017 6.631e-02 4.713e-02 1.407 0.159498
regionNortheast:year2017 5.123e-02 4.713e-02 1.087 0.277109
regionNorthernNewEngland:year2017 4.724e-03 4.713e-02 0.100 0.920160
regionOrlando:year2017 8.178e-02 4.713e-02 1.735 0.082730 .
regionPhiladelphia:year2017 5.299e-02 4.713e-02 1.124 0.260891
regionPhoenixTucson:year2017 1.635e-02 4.713e-02 0.347 0.728647
regionPittsburgh:year2017 -1.434e-01 4.713e-02 -3.042 0.002355 **
regionPlains:year2017 -2.649e-02 4.713e-02 -0.562 0.574052
regionPortland:year2017 2.843e-02 4.713e-02 0.603 0.546348
regionRaleighGreensboro:year2017 2.202e-01 4.713e-02 4.671 3.01e-06 ***
regionRichmondNorfolk:year2017 2.565e-02 4.713e-02 0.544 0.586360
regionRoanoke:year2017 3.211e-02 4.713e-02 0.681 0.495754
regionSacramento:year2017 2.209e-01 4.713e-02 4.688 2.78e-06 ***
regionSanDiego:year2017 2.112e-01 4.713e-02 4.481 7.46e-06 ***
regionSanFrancisco:year2017 2.458e-01 4.713e-02 5.215 1.86e-07 ***
regionSeattle:year2017 7.805e-02 4.713e-02 1.656 0.097751 .
regionSouthCarolina:year2017 -7.398e-02 4.713e-02 -1.570 0.116516
regionSouthCentral:year2017 -4.827e-02 4.713e-02 -1.024 0.305789
regionSoutheast:year2017 -1.622e-03 4.713e-02 -0.034 0.972549
regionSpokane:year2017 1.051e-01 4.713e-02 2.229 0.025817 *
regionStLouis:year2017 -1.065e-02 4.713e-02 -0.226 0.821183
regionSyracuse:year2017 -3.868e-02 4.713e-02 -0.821 0.411787
regionTampa:year2017 1.636e-01 4.713e-02 3.472 0.000519 ***
regionTotalUS:year2017 8.012e-02 4.713e-02 1.700 0.089167 .
regionWest:year2017 5.313e-02 4.713e-02 1.127 0.259636
regionWestTexNewMexico:year2017 -7.563e-02 4.730e-02 -1.599 0.109859
regionAtlanta:year2018 1.109e-02 7.733e-02 0.143 0.885972
regionBaltimoreWashington:year2018 1.124e-01 7.733e-02 1.454 0.146096
regionBoise:year2018 2.217e-01 7.733e-02 2.866 0.004156 **
regionBoston:year2018 2.060e-01 7.733e-02 2.664 0.007725 **
regionBuffaloRochester:year2018 -2.154e-01 7.733e-02 -2.786 0.005341 **
regionCalifornia:year2018 1.983e-01 7.733e-02 2.564 0.010347 *
regionCharlotte:year2018 9.647e-03 7.733e-02 0.125 0.900720
regionChicago:year2018 2.605e-01 7.733e-02 3.369 0.000756 ***
regionCincinnatiDayton:year2018 1.764e-01 7.733e-02 2.282 0.022523 *
regionColumbus:year2018 7.372e-04 7.733e-02 0.010 0.992394
regionDallasFtWorth:year2018 1.279e-01 7.733e-02 1.655 0.098035 .
regionDenver:year2018 1.960e-01 7.733e-02 2.534 0.011284 *
regionDetroit:year2018 -5.744e-02 7.733e-02 -0.743 0.457661
regionGrandRapids:year2018 1.490e-02 7.733e-02 0.193 0.847176
regionGreatLakes:year2018 5.500e-02 7.733e-02 0.711 0.476957
regionHarrisburgScranton:year2018 -3.205e-02 7.733e-02 -0.414 0.678539
regionHartfordSpringfield:year2018 3.263e-02 7.733e-02 0.422 0.673085
regionHouston:year2018 9.692e-02 7.733e-02 1.253 0.210099
regionIndianapolis:year2018 -7.173e-02 7.733e-02 -0.928 0.353643
regionJacksonville:year2018 5.651e-02 7.733e-02 0.731 0.464972
regionLasVegas:year2018 1.278e-01 7.733e-02 1.653 0.098372 .
regionLosAngeles:year2018 3.021e-01 7.733e-02 3.906 9.41e-05 ***
regionLouisville:year2018 7.641e-02 7.733e-02 0.988 0.323126
regionMiamiFtLauderdale:year2018 6.353e-02 7.733e-02 0.821 0.411391
regionMidsouth:year2018 1.099e-01 7.733e-02 1.421 0.155277
regionNashville:year2018 4.821e-02 7.733e-02 0.623 0.533060
regionNewOrleansMobile:year2018 3.939e-02 7.733e-02 0.509 0.610495
regionNewYork:year2018 3.298e-02 7.733e-02 0.426 0.669761
regionNortheast:year2018 3.333e-02 7.733e-02 0.431 0.666443
regionNorthernNewEngland:year2018 5.080e-02 7.733e-02 0.657 0.511237
regionOrlando:year2018 -4.183e-02 7.733e-02 -0.541 0.588600
regionPhiladelphia:year2018 -3.526e-03 7.733e-02 -0.046 0.963637
regionPhoenixTucson:year2018 1.008e-01 7.733e-02 1.303 0.192425
[ reached getOption("max.print") -- omitted 20 rows ]
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2415 on 18029 degrees of freedom
Multiple R-squared: 0.6447, Adjusted R-squared: 0.6404
F-statistic: 149.4 on 219 and 18029 DF, p-value: < 2.2e-16
model5pf <- lm(average_price ~ type + region + quarter + year + quarter:year, data = trimmed_avocados)
summary(model5pf)
Call:
lm(formula = average_price ~ type + region + quarter + year +
quarter:year, data = trimmed_avocados)
Residuals:
Min 1Q Median 3Q Max
-0.96042 -0.13634 -0.00203 0.13537 1.48398
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.259208 0.014541 86.600 < 2e-16 ***
typeorganic 0.495932 0.003553 139.577 < 2e-16 ***
regionAtlanta -0.223077 0.018461 -12.084 < 2e-16 ***
regionBaltimoreWashington -0.026805 0.018461 -1.452 0.146526
regionBoise -0.212899 0.018461 -11.532 < 2e-16 ***
regionBoston -0.030148 0.018461 -1.633 0.102472
regionBuffaloRochester -0.044201 0.018461 -2.394 0.016662 *
regionCalifornia -0.165710 0.018461 -8.976 < 2e-16 ***
regionCharlotte 0.045000 0.018461 2.438 0.014795 *
regionChicago -0.004260 0.018461 -0.231 0.817490
regionCincinnatiDayton -0.351834 0.018461 -19.058 < 2e-16 ***
regionColumbus -0.308254 0.018461 -16.698 < 2e-16 ***
regionDallasFtWorth -0.475444 0.018461 -25.754 < 2e-16 ***
regionDenver -0.342456 0.018461 -18.550 < 2e-16 ***
regionDetroit -0.284941 0.018461 -15.435 < 2e-16 ***
regionGrandRapids -0.056036 0.018461 -3.035 0.002406 **
regionGreatLakes -0.222485 0.018461 -12.052 < 2e-16 ***
regionHarrisburgScranton -0.047751 0.018461 -2.587 0.009700 **
regionHartfordSpringfield 0.257604 0.018461 13.954 < 2e-16 ***
regionHouston -0.513107 0.018461 -27.794 < 2e-16 ***
regionIndianapolis -0.247041 0.018461 -13.382 < 2e-16 ***
regionJacksonville -0.050089 0.018461 -2.713 0.006669 **
regionLasVegas -0.180118 0.018461 -9.757 < 2e-16 ***
regionLosAngeles -0.345030 0.018461 -18.690 < 2e-16 ***
regionLouisville -0.274349 0.018461 -14.861 < 2e-16 ***
regionMiamiFtLauderdale -0.132544 0.018461 -7.180 7.25e-13 ***
regionMidsouth -0.156272 0.018461 -8.465 < 2e-16 ***
regionNashville -0.348935 0.018461 -18.901 < 2e-16 ***
regionNewOrleansMobile -0.256243 0.018461 -13.880 < 2e-16 ***
regionNewYork 0.166538 0.018461 9.021 < 2e-16 ***
regionNortheast 0.040888 0.018461 2.215 0.026785 *
regionNorthernNewEngland -0.083639 0.018461 -4.531 5.92e-06 ***
regionOrlando -0.054822 0.018461 -2.970 0.002985 **
regionPhiladelphia 0.071095 0.018461 3.851 0.000118 ***
regionPhoenixTucson -0.336598 0.018461 -18.233 < 2e-16 ***
regionPittsburgh -0.196716 0.018461 -10.656 < 2e-16 ***
regionPlains -0.124527 0.018461 -6.745 1.57e-11 ***
regionPortland -0.243314 0.018461 -13.180 < 2e-16 ***
regionRaleighGreensboro -0.005917 0.018461 -0.321 0.748575
regionRichmondNorfolk -0.269704 0.018461 -14.609 < 2e-16 ***
regionRoanoke -0.313107 0.018461 -16.961 < 2e-16 ***
regionSacramento 0.060533 0.018461 3.279 0.001044 **
regionSanDiego -0.162870 0.018461 -8.822 < 2e-16 ***
regionSanFrancisco 0.243166 0.018461 13.172 < 2e-16 ***
regionSeattle -0.118462 0.018461 -6.417 1.43e-10 ***
regionSouthCarolina -0.157751 0.018461 -8.545 < 2e-16 ***
regionSouthCentral -0.459793 0.018461 -24.906 < 2e-16 ***
regionSoutheast -0.163018 0.018461 -8.830 < 2e-16 ***
regionSpokane -0.115444 0.018461 -6.253 4.11e-10 ***
regionStLouis -0.130414 0.018461 -7.064 1.67e-12 ***
regionSyracuse -0.040710 0.018461 -2.205 0.027452 *
regionTampa -0.152189 0.018461 -8.244 < 2e-16 ***
regionTotalUS -0.242012 0.018461 -13.109 < 2e-16 ***
regionWest -0.288817 0.018461 -15.645 < 2e-16 ***
regionWestTexNewMexico -0.296594 0.018502 -16.030 < 2e-16 ***
quarter2 0.021204 0.009058 2.341 0.019248 *
quarter3 0.082991 0.009058 9.162 < 2e-16 ***
quarter4 -0.010357 0.009060 -1.143 0.252944
year2016 -0.117821 0.009058 -13.007 < 2e-16 ***
year2017 -0.056574 0.009058 -6.246 4.31e-10 ***
year2018 -0.004613 0.009245 -0.499 0.617792
quarter2:year2016 -0.028533 0.012810 -2.227 0.025932 *
quarter3:year2016 0.095192 0.012810 7.431 1.12e-13 ***
quarter4:year2016 0.256768 0.012811 20.043 < 2e-16 ***
quarter2:year2017 0.208350 0.012812 16.262 < 2e-16 ***
quarter3:year2017 0.312536 0.012810 24.398 < 2e-16 ***
quarter4:year2017 0.261262 0.012696 20.578 < 2e-16 ***
quarter2:year2018 NA NA NA NA
quarter3:year2018 NA NA NA NA
quarter4:year2018 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.24 on 18182 degrees of freedom
Multiple R-squared: 0.6461, Adjusted R-squared: 0.6448
F-statistic: 502.9 on 66 and 18182 DF, p-value: < 2.2e-16
So it looks like model5pa with the type, region, quarter, year, and type:region is the best, with a moderate gain in multiple-r2 due to the interaction. However, we need to test for the significance of the interaction given the various p-values of the associated coefficients
Neat, it looks like including the interaction is statistically justified. So we can keep it in. And our final model is:
average_price ~ type + region + quarter + year + type:region